[Lucene.Net] Score(collector) called for each subReader - but not what I need

Robert Stewart Fri, 10 Jun 2011 09:14:44 -0700

As I previously tried to explain, I have custom query for some pre-cached 
terms, which I load into RAM in efficient compressed form.  I need this for 
faster searching and also for much faster faceting.  So what I do is process 
incoming query and replace certain sub-queries with my own "CachedTermQuery" 
objects, which extend Query.  Since these are not per-segment, I only want 
scorer.Score(collector) called once, not once for each segment in my index.  
Essentially what happens now if I have a search is it collects the same 
documents N times, 1 time for each segment.  Is there anyway to combine 
different Scorers/Collectors such that I can control when it enumerates 
collection by multiple sub-readers, and when not to?  This all worked in 
previous version of Lucene because enumerating sub-indexes (segments) was 
pushed to a lower level inside Lucene API and not it is elevated to a higher 
level.


Thanks
Bob


On Jun 9, 2011, at 4:33 PM, Robert Stewart wrote:

> I found the problem.  The problem is that I have a custom "query optimizer", 
> and that replaces certain TermQuery's within a Boolean query with a custom 
> Query and this query has its own weight/scorer that retrieves matching 
> documents from an in-memory cache (and that is not Lucene backed).  But it 
> looks like my custom hitcollectors are now wrapped in a HitCollectorWrapper 
> which assumes Collect() needs called for multiple segments - so it is adding 
> a start offset to the doc ID that comes from my custom query implementation.  
> I looked at the new Collector class and it seems it works the same way 
> (assumes it needs to set the next index reader with some offset).  How can I 
> make my custom query work with the new API (so that there is basically a 
> single "segment" in RAM that my query uses, but still other query clauses in 
> same boolean query use multiple lucene segments)?  I am sure that is not 
> clear and will try to provide more detail soon.
> 
> Thanks
> Bob
> 
> 
> On Jun 9, 2011, at 1:48 PM, Digy wrote:
> 
>> Sorry no idea. Maybe optimizing the index with 2.9.2 can help to detect the
>> problem.
>> DIGY
>> 
>> -----Original Message-----
>> From: Robert Stewart [mailto:[email protected]] 
>> Sent: Thursday, June 09, 2011 8:40 PM
>> To: <[email protected]>
>> Subject: Re: [Lucene.Net] index version compatibility (1.9 to 2.9.2)?
>> 
>> I tried converting index using IndexWriter as follows:
>> 
>> Lucene.Net.Index.IndexWriter writer = new IndexWriter(TestIndexPath+"_2.9",
>> new Lucene.Net.Analysis.KeywordAnalyzer());
>> 
>> writer.SetMaxBufferedDocs(2);
>> writer.SetMaxMergeDocs(1000000);
>> writer.SetMergeFactor(2);
>> 
>> writer.AddIndexesNoOptimize(new Lucene.Net.Store.Directory[] { new
>> Lucene.Net.Store.SimpleFSDirectory(new DirectoryInfo(TestIndexPath)) });
>> 
>> writer.Commit();
>> 
>> 
>> That seems to work (I get what looks like a valid index directory at least).
>> 
>> But still when I run some tests using IndexSearcher I get the same problem
>> (I get documents in Collect() which are larger than IndexReader.MaxDoc()).
>> Any idea what the problem could be?  
>> 
>> BTW, this is a problem because I lookup some fields (date ranges, etc.) in
>> some custom collectors which filter out documents, and it assumes I dont get
>> any documents larger than maxDoc.
>> 
>> Thanks,
>> Bob
>> 
>> 
>> On Jun 9, 2011, at 12:37 PM, Digy wrote:
>> 
>>> One more point, some write operations using Lucene.Net 2.9.2 (add, delete,
>>> optimize etc.) upgrades automatically your index to 2.9.2.
>>> But if your index is somehow corrupted(eg, due to some bug in 1.9) this
>> may
>>> result in data loss.
>>> 
>>> DIGY
>>> 
>>> -----Original Message-----
>>> From: Robert Stewart [mailto:[email protected]] 
>>> Sent: Thursday, June 09, 2011 7:06 PM
>>> To: [email protected]
>>> Subject: [Lucene.Net] index version compatibility (1.9 to 2.9.2)?
>>> 
>>> I have a Lucene index created with Lucene.Net 1.9.  I have a multi-segment
>>> index (non-optimized).   When I run Lucene.Net 2.9.2 on top of that index,
>> I
>>> get IndexOutOfRange exceptions in my collectors.  It is giving me document
>>> IDs that are larger than maxDoc.  
>>> 
>>> My index contains 377831 documents, and IndexReader.MaxDoc() is returning
>>> 377831, but I get documents from Collect() with large values (for instance
>>> 379018).  Is an index built with Lucene.Net 1.9 compatible with 2.9.2?  If
>>> not, is there some way I can convert it (in production we have many
>> indexes
>>> containing about 200 million docs so I'd rather convert existing indexes
>>> than rebuilt them).
>>> 
>>> Thanks
>>> Bob=
>>> 
>> 
>

[Lucene.Net] Score(collector) called for each subReader - but not what I need

Reply via email to