Grant Ingersoll-6 wrote: > > When you are indexing the file and adding the Document, you will need > to parse out your filename per your regular expression, and then > create the appropriate field: > > Document doc = new Document() > String cat = getCategoryFromFileName(inputFileName) > doc.add(new Field("category", cat, ...) > //do the rest of your adds > > Just locate where in the demo the Document add is taking place (I > forget the exact spot) and then add in the appropriate stuff from > above. Obviously, you need to implement the method I stubbed called > getCategoryFromFileName. > > HTH, > Grant >
Thanks, Grant. That was just the hint I needed. I found that the fields are populated in HTMLDocument. I added: doc.add(new Field("category", "test", Field.Store.YES, Field.Index.TOKENIZED)); and then used Luke to verify that this field had been added. It had. Now I am trying to get a quick-and-dirty way of setting the field based on the filename, but I'm running into problems that I don't really understand well enough to fix quickly. I have only very limited experience of Java programming, so I might be using the wrong terms, but I think the problem is variable scope. I get a compilation error: HTMLDocument.java:86: cannot find symbol symbol : variable url location: class org.apache.lucene.demo.HTMLDocument if (url.indexOf("-ov-") != -1) { I thought I'd be able to use a simple mechanism based on indexOf() to check the existence of a short sequence of characters within the filename. For example, "-sys-". I know that this sequence, if it exists anywhere in the full path must be in the filename. So I put in a series of if statements like this: if (url.indexOf("-sys-") != -1) { string category = "system"; } then right at the end: doc.add(new Field("category", category, Field.Store.YES, Field.Index.TOKENIZED)); Am I right in thinking that the variable url is undefined at this point in the code? It certainly seems to be defined earlier on in the file: public static String uid2url(String uid) { String url = uid.replace('\u0000', '/'); // replace nulls with slashes return url.substring(0, url.lastIndexOf('/')); // remove date from end } Is there some way for me to perhaps chop down to the filename here, and make that available later in the code? K. -- View this message in context: http://www.nabble.com/Create-and-populate-a-field-when-indexing-tf4713018.html#a13667927 Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]