Thanks for confirming it. That is good to know and I am sure there are good reasons for it (performance). Anyhow, sounds like good mouse trap that probably deserves a few comments in javadoc. - From the fact that term exists in term dictionary one cannot conclude that there are actual documents containing it (people using external IDs and taking shortcut in checking if document exists in Index by checking existence in term dictionary; Spell checkers that index terms from index)...
- Stats are stale and change in time (I have seen comments about it somewhere) As a luxury option (this all is really not a big deal), maybe an idea would be to have some sort of lightweight optimize "refreshStatsAndLexicon()" that just brings stats and term dict into consistent state, without touching postings / stored fields and other heavy things? Having this clarified, back to the original question, I am now 95% sure "Deleted Docs as Filters" will be faster (for cases with more than one term/Clause in Query) or equally fast for single term queries. 5% uncertainty comes from skipTo() vs get(int i) performance diff. Imo, this can be visible only for single term Queries in high density case, maybe not even there... ----- Original Message ---- > From: Michael McCandless <luc...@mikemccandless.com> > To: java-dev@lucene.apache.org > Sent: Saturday, 31 January, 2009 21:01:55 > Subject: Re: [jira] Created: (LUCENE-1533) Deleted documents as a Filter or > top level Query > > > Right, we just filter out the docs when iterating through postings. > > So this means, as segments are merged, the stats get corrected, which means > document scores will change for a given query. > > Mike > > Mark Miller wrote: > > > eks dev wrote: > >> "...many core unit tests will need to change, or.." > >> > >> Thinking about it a bit more, what is current contract for deleted > >> documents > in respect to terms? > >> > >> if we delete document from an index, do we update global freqs and > >> eventually > delete terms... or we simply say document ID will not be found again? I guess > freqs stay unchanged until we merge segments? It is probably somewhere in > javadocs or wiki, but I do not remember I have seen it somewhere described. > It > may be important in some cases. > >> > > All the stats stay unchanged. I think we just filter the id. I've def seen > that the stats are unchanged and everything is still loaded in FieldCache and > what not. Until the deletes are merged out anyway. > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-dev-h...@lucene.apache.org > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-dev-h...@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org