As I previously tried to explain, I have custom query for some pre-cached terms, which I load into RAM in efficient compressed form. I need this for faster searching and also for much faster faceting. So what I do is process incoming query and replace certain sub-queries with my own "CachedTermQuery" objects, which extend Query. Since these are not per-segment, I only want scorer.Score(collector) called once, not once for each segment in my index. Essentially what happens now if I have a search is it collects the same documents N times, 1 time for each segment. Is there anyway to combine different Scorers/Collectors such that I can control when it enumerates collection by multiple sub-readers, and when not to? This all worked in previous version of Lucene because enumerating sub-indexes (segments) was pushed to a lower level inside Lucene API and not it is elevated to a higher level.
Thanks Bob On Jun 9, 2011, at 4:33 PM, Robert Stewart wrote: > I found the problem. The problem is that I have a custom "query optimizer", > and that replaces certain TermQuery's within a Boolean query with a custom > Query and this query has its own weight/scorer that retrieves matching > documents from an in-memory cache (and that is not Lucene backed). But it > looks like my custom hitcollectors are now wrapped in a HitCollectorWrapper > which assumes Collect() needs called for multiple segments - so it is adding > a start offset to the doc ID that comes from my custom query implementation. > I looked at the new Collector class and it seems it works the same way > (assumes it needs to set the next index reader with some offset). How can I > make my custom query work with the new API (so that there is basically a > single "segment" in RAM that my query uses, but still other query clauses in > same boolean query use multiple lucene segments)? I am sure that is not > clear and will try to provide more detail soon. > > Thanks > Bob > > > On Jun 9, 2011, at 1:48 PM, Digy wrote: > >> Sorry no idea. Maybe optimizing the index with 2.9.2 can help to detect the >> problem. >> DIGY >> >> -----Original Message----- >> From: Robert Stewart [mailto:[email protected]] >> Sent: Thursday, June 09, 2011 8:40 PM >> To: <[email protected]> >> Subject: Re: [Lucene.Net] index version compatibility (1.9 to 2.9.2)? >> >> I tried converting index using IndexWriter as follows: >> >> Lucene.Net.Index.IndexWriter writer = new IndexWriter(TestIndexPath+"_2.9", >> new Lucene.Net.Analysis.KeywordAnalyzer()); >> >> writer.SetMaxBufferedDocs(2); >> writer.SetMaxMergeDocs(1000000); >> writer.SetMergeFactor(2); >> >> writer.AddIndexesNoOptimize(new Lucene.Net.Store.Directory[] { new >> Lucene.Net.Store.SimpleFSDirectory(new DirectoryInfo(TestIndexPath)) }); >> >> writer.Commit(); >> >> >> That seems to work (I get what looks like a valid index directory at least). >> >> But still when I run some tests using IndexSearcher I get the same problem >> (I get documents in Collect() which are larger than IndexReader.MaxDoc()). >> Any idea what the problem could be? >> >> BTW, this is a problem because I lookup some fields (date ranges, etc.) in >> some custom collectors which filter out documents, and it assumes I dont get >> any documents larger than maxDoc. >> >> Thanks, >> Bob >> >> >> On Jun 9, 2011, at 12:37 PM, Digy wrote: >> >>> One more point, some write operations using Lucene.Net 2.9.2 (add, delete, >>> optimize etc.) upgrades automatically your index to 2.9.2. >>> But if your index is somehow corrupted(eg, due to some bug in 1.9) this >> may >>> result in data loss. >>> >>> DIGY >>> >>> -----Original Message----- >>> From: Robert Stewart [mailto:[email protected]] >>> Sent: Thursday, June 09, 2011 7:06 PM >>> To: [email protected] >>> Subject: [Lucene.Net] index version compatibility (1.9 to 2.9.2)? >>> >>> I have a Lucene index created with Lucene.Net 1.9. I have a multi-segment >>> index (non-optimized). When I run Lucene.Net 2.9.2 on top of that index, >> I >>> get IndexOutOfRange exceptions in my collectors. It is giving me document >>> IDs that are larger than maxDoc. >>> >>> My index contains 377831 documents, and IndexReader.MaxDoc() is returning >>> 377831, but I get documents from Collect() with large values (for instance >>> 379018). Is an index built with Lucene.Net 1.9 compatible with 2.9.2? If >>> not, is there some way I can convert it (in production we have many >> indexes >>> containing about 200 million docs so I'd rather convert existing indexes >>> than rebuilt them). >>> >>> Thanks >>> Bob= >>> >> >
