I've encountered some very puzzling Lucene behavior (I'm using 1.3dev1, 
StandardAnalyzer, QueryParser).  

My indexed documents have, among other fields, two Text fields (indexed, tokenized, 
stored) called "pub_date" and "id".  These two fields have similar values.  A typical 
pub_date value is "20021121" and a typical id value is "20021121_4477".

When I search on pub_date, everything is normal.  The search for the query 
"pub_date:20021121" responds in less than a second with about 200 hits (only 25 of 
which are displayed).  Changing the query to "pub_date:2002112*" generates about 1000 
hits with about the same response time.  Changing it to "pub_date:200211*" increases 
the hits to about 5000 with little increase in response time.  Finally, using 
"pub_date:2002*" generates about 35,000 hits and requires about two seconds or so.

However,..... when I try the same kinds of searches the 'id' field, the results are 
*very* different.  

Searching against the query "id:20021121*" generates the same approximately 200 hits 
in less than a second.  Using the query "id:2002112*" generates about 1000 hits, but 
it takes about two seconds.  Changing the query to "id:200211*" produces about 500 
hits, but requires about seven seconds.  Generalizing the query to "id:2002*" causes 
my application to crash with an out of memory error.

As mentioned above, the two fields are very similar in content and are indexed in the 
same way.  One behaves as expected but the other one doesn't work well at all.

Any ideas on what's going on here?  Is "id" some kind of reserved Lucene word?

Regards,

Terry


Reply via email to