Grant Ingersoll-6 wrote:
>
> When you are indexing the file and adding the Document, you will need
> to parse out your filename per your regular expression, and then
> create the appropriate field:
>
> Document doc = new Document()
> String cat = getCategoryFromFileName(inputFileName)
> doc.add(new Field("category", cat, ...)
> //do the rest of your adds
>
> Just locate where in the demo the Document add is taking place (I
> forget the exact spot) and then add in the appropriate stuff from
> above. Obviously, you need to implement the method I stubbed called
> getCategoryFromFileName.
>
> HTH,
> Grant
>
Thanks, Grant. That was just the hint I needed.
I found that the fields are populated in HTMLDocument.
I added:
doc.add(new Field("category", "test", Field.Store.YES,
Field.Index.TOKENIZED));
and then used Luke to verify that this field had been added. It had.
Now I am trying to get a quick-and-dirty way of setting the field based on
the filename, but I'm running into problems that I don't really understand
well enough to fix quickly.
I have only very limited experience of Java programming, so I might be using
the wrong terms, but I think the problem is variable scope. I get a
compilation error:
HTMLDocument.java:86: cannot find symbol
symbol : variable url
location: class org.apache.lucene.demo.HTMLDocument
if (url.indexOf("-ov-") != -1) {
I thought I'd be able to use a simple mechanism based on indexOf() to check
the existence of a short sequence of characters within the filename. For
example, "-sys-". I know that this sequence, if it exists anywhere in the
full path must be in the filename.
So I put in a series of if statements like this:
if (url.indexOf("-sys-") != -1) {
string category = "system";
}
then right at the end:
doc.add(new Field("category", category, Field.Store.YES,
Field.Index.TOKENIZED));
Am I right in thinking that the variable url is undefined at this point in
the code? It certainly seems to be defined earlier on in the file:
public static String uid2url(String uid) {
String url = uid.replace('\u0000', '/'); // replace nulls with slashes
return url.substring(0, url.lastIndexOf('/')); // remove date from end
}
Is there some way for me to perhaps chop down to the filename here, and make
that available later in the code?
K.
--
View this message in context:
http://www.nabble.com/Create-and-populate-a-field-when-indexing-tf4713018.html#a13667927
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]