Hi, I just started with Lucene today, and the first thing I did was try out the small demo. I followed the instructions in "Getting started - Building and Installing the Basic Demo" by the letter -- I downloaded the JAR files (2.3.2), unpacked and launched the indexer on the src directory -- worked fine, indexed all java files in the directory and its subdirectories. I didn't try to search for a swearword, but I did try to search for "vector". The fact that I got only one result whereas the demo says I should get a bunch of them isn't really the problem. The problem is that I got only one result although the word "vector" appears in TWO documents: src/demo/org/apache/lucene/demo/html/HTMLParser.java src/demo/org/apache/lucene/demo/SearchFiles.java (I checked that with grep)
When I enter my query, I get a very clear answer: Enter query: vector Searching for: vector 1 total matching documents 1. src/demo/org/apache/lucene/demo/SearchFiles.java grep's version: [silenos:apache/lucene/demo] veda> pwd /home/veda/lucene/lucene-2.3.2/src/demo/org/apache/lucene/demo [silenos:apache/lucene/demo] veda> grep -i vector * */* SearchFiles.java: * are all identical, then single norm vector may be shared. */ html/HTMLParser.java: private java.util.Vector jj_expentries = new java.util.Vector(); [silenos:apache/lucene/demo] veda> So my question is a very easy one: what happened? Is there a special processing for java files, like for HTML documents, which leaves comments out? Is that a bug only in the "demo" part of this small program (this would be surprising, as other queries seem to be working fine)? Is there actually a way I can check the content of my index -- what files were actually indexed, or search for a file in particular? A bit like a field search, but with the URI of the file itself (though I think I read this is implementation-dependent, that means one could do it programmatically, but it's not in the demo, right?)? Anyway, thx for your answers. I hope there is a good one to this question, cos I'd feel rather deceived if a search engine so obviously ignores some results... David -- View this message in context: http://www.nabble.com/Preliminary%2C-fundamental-question-about-the-demo-tp19367781p19367781.html Sent from the Lucene - General mailing list archive at Nabble.com.
