Having an indexed-field seems to occur significant overhead when merging, and if the index is highly interactive, the merging process occurs quite often.
Maybe I am incorrect regarding the overhead of indexed fields? I have attempted to keep the number of indexed fields to a minimum. I think it boils down to whether a being able to do a range query (for date filtering as an example) is worth the cost of maintaining that index. If the other terms are mildly rare, then inspecting the documents to match against the needed range seems more efficient (thus the need to turn Filter into an interface). But if the term they are looking for is common, then the date range would be needed (to avoid a scan of all documents matching the term). It may just be that all fields need to be indexed in order to cover all cases (and that the cost of doing a range filter on a indexed field is far less in ALL cases than inspecting any documents). -----Original Message----- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Wednesday, May 17, 2006 6:19 PM To: java-dev@lucene.apache.org Subject: Re: non indexed field searching? On May 17, 2006, at 11:20 AM, Robert Engels wrote: > I reviewed the solr source (at LOT of the code is amazingly similar > to our > own search server). > > I don't see anything related to searching using non-indexed fields. > Could > you maybe point me at the class(es) that implement this functionality? Sorry, I missed the "non" part of "non-indexed fields". I don't quite understand why you wouldn't just index every field if you needed that capability though. Erik > > -----Original Message----- > From: Erik Hatcher [mailto:[EMAIL PROTECTED] > Sent: Tuesday, May 16, 2006 6:35 PM > To: java-dev@lucene.apache.org > Subject: Re: non indexed field searching? > > > > On May 16, 2006, at 3:37 PM, Robert Engels wrote: >> It seems that maybe a query could be separated into Filter and >> Query clauses >> (similar to how the query optimizer works in Nutch). Clauses that >> were based >> on non-indexed fields would be converted to a Filter. >> >> The problem is if you have some thing like >> >> (indexed:somevalue OR nonindexed:somevalue) >> >> would require a complete visit to every document. > > Not necessarily. A query optimizer could could extract these term > query clauses, look up cached doc sets (bit sets) and union them. > Scoring is the trickier part - I'm now curious to dig into Solr and > see how it handles this. > >> I understand that this is moving Lucene closer to a database, but >> it is just >> very difficult to perform some complex queries efficiently without >> it. > > Check out Solr - I think you'll find it fits this niche nicely. > >> *** As an aside, I still don't understand why Filter is not an >> interface > > I saw that Paul Elschot has just done some refactoring work attached > to a JIRA issue on this very topic. > > Erik > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]