Re: [jira] Created: (LUCENE-1533) Deleted documents as a Filter or top level Query

eks dev Sat, 31 Jan 2009 12:47:47 -0800

Thanks for confirming it.
 
That is good to know and I am sure there are good reasons for it (performance). 
Anyhow, sounds like good mouse trap that probably deserves a few comments in 
javadoc.
 
- From the fact that term exists in term dictionary one cannot conclude that 
there are actual documents containing it (people using external IDs and taking 
shortcut in checking if document exists in Index by checking existence in term 
dictionary; Spell checkers that index terms from index)...


- Stats are stale and change in time (I have seen comments about it somewhere)

As a luxury option (this all is really not a big deal), maybe an idea would be 
to have some sort of lightweight optimize "refreshStatsAndLexicon()" that just 
brings stats and term dict into consistent state, without touching postings / 
stored fields and other heavy things?


Having this clarified, back to the original question, I am now 95% sure 
"Deleted Docs as Filters" will be faster (for cases with more than one 
term/Clause in Query) or equally fast for single term queries. 5% uncertainty 
comes from skipTo() vs get(int i) performance diff. Imo, this can be visible 
only for single term Queries in high density case, maybe not even there...     

 



----- Original Message ----
> From: Michael McCandless <luc...@mikemccandless.com>
> To: java-dev@lucene.apache.org
> Sent: Saturday, 31 January, 2009 21:01:55
> Subject: Re: [jira] Created: (LUCENE-1533) Deleted documents as a Filter or 
> top level Query
> 
> 
> Right, we just filter out the docs when iterating through postings.
> 
> So this means, as segments are merged, the stats get corrected, which means 
> document scores will change for a given query.
> 
> Mike
> 
> Mark Miller wrote:
> 
> > eks dev wrote:
> >> "...many core unit tests will need to change, or.."
> >> 
> >> Thinking about it a bit more, what is current contract for deleted 
> >> documents 
> in respect to terms?
> >> 
> >> if we delete document from an index, do we update global freqs and 
> >> eventually 
> delete terms... or we simply say document ID will not be found again? I guess 
> freqs stay unchanged until we merge segments? It is probably somewhere in 
> javadocs or wiki, but I do not remember I have seen it somewhere described. 
> It 
> may be important in some cases.
> >> 
> > All the stats stay unchanged. I think we just filter the id. I've def seen 
> that the stats are unchanged and everything is still loaded in FieldCache and 
> what not. Until the deletes are merged out anyway.
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-dev-h...@lucene.apache.org
> > 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-dev-h...@lucene.apache.org





---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: [jira] Created: (LUCENE-1533) Deleted documents as a Filter or top level Query

Reply via email to