No I will try it though. Thanks. Bob
On Jun 10, 2011, at 12:37 PM, Digy wrote: > Have you tried to use Lucene.Net as is, before working on optimizing your > code? There are a lot of speed improvements in it since 1.9. > There is also a Faceted Search project in contrib. > (https://cwiki.apache.org/confluence/display/LUCENENET/Simple+Faceted+Search > ) > > DIGY > > > > -----Original Message----- > From: Robert Stewart [mailto:[email protected]] > Sent: Friday, June 10, 2011 7:14 PM > To: <[email protected]> > Subject: [Lucene.Net] Score(collector) called for each subReader - but not > what I need > > As I previously tried to explain, I have custom query for some pre-cached > terms, which I load into RAM in efficient compressed form. I need this for > faster searching and also for much faster faceting. So what I do is process > incoming query and replace certain sub-queries with my own "CachedTermQuery" > objects, which extend Query. Since these are not per-segment, I only want > scorer.Score(collector) called once, not once for each segment in my index. > Essentially what happens now if I have a search is it collects the same > documents N times, 1 time for each segment. Is there anyway to combine > different Scorers/Collectors such that I can control when it enumerates > collection by multiple sub-readers, and when not to? This all worked in > previous version of Lucene because enumerating sub-indexes (segments) was > pushed to a lower level inside Lucene API and not it is elevated to a higher > level. > > Thanks > Bob > > > On Jun 9, 2011, at 4:33 PM, Robert Stewart wrote: > >> I found the problem. The problem is that I have a custom "query > optimizer", and that replaces certain TermQuery's within a Boolean query > with a custom Query and this query has its own weight/scorer that retrieves > matching documents from an in-memory cache (and that is not Lucene backed). > But it looks like my custom hitcollectors are now wrapped in a > HitCollectorWrapper which assumes Collect() needs called for multiple > segments - so it is adding a start offset to the doc ID that comes from my > custom query implementation. I looked at the new Collector class and it > seems it works the same way (assumes it needs to set the next index reader > with some offset). How can I make my custom query work with the new API (so > that there is basically a single "segment" in RAM that my query uses, but > still other query clauses in same boolean query use multiple lucene > segments)? I am sure that is not clear and will try to provide more detail > soon. >> >> Thanks >> Bob >> >> >> On Jun 9, 2011, at 1:48 PM, Digy wrote: >> >>> Sorry no idea. Maybe optimizing the index with 2.9.2 can help to detect > the >>> problem. >>> DIGY >>> >>> -----Original Message----- >>> From: Robert Stewart [mailto:[email protected]] >>> Sent: Thursday, June 09, 2011 8:40 PM >>> To: <[email protected]> >>> Subject: Re: [Lucene.Net] index version compatibility (1.9 to 2.9.2)? >>> >>> I tried converting index using IndexWriter as follows: >>> >>> Lucene.Net.Index.IndexWriter writer = new > IndexWriter(TestIndexPath+"_2.9", >>> new Lucene.Net.Analysis.KeywordAnalyzer()); >>> >>> writer.SetMaxBufferedDocs(2); >>> writer.SetMaxMergeDocs(1000000); >>> writer.SetMergeFactor(2); >>> >>> writer.AddIndexesNoOptimize(new Lucene.Net.Store.Directory[] { new >>> Lucene.Net.Store.SimpleFSDirectory(new DirectoryInfo(TestIndexPath)) }); >>> >>> writer.Commit(); >>> >>> >>> That seems to work (I get what looks like a valid index directory at > least). >>> >>> But still when I run some tests using IndexSearcher I get the same > problem >>> (I get documents in Collect() which are larger than > IndexReader.MaxDoc()). >>> Any idea what the problem could be? >>> >>> BTW, this is a problem because I lookup some fields (date ranges, etc.) > in >>> some custom collectors which filter out documents, and it assumes I dont > get >>> any documents larger than maxDoc. >>> >>> Thanks, >>> Bob >>> >>> >>> On Jun 9, 2011, at 12:37 PM, Digy wrote: >>> >>>> One more point, some write operations using Lucene.Net 2.9.2 (add, > delete, >>>> optimize etc.) upgrades automatically your index to 2.9.2. >>>> But if your index is somehow corrupted(eg, due to some bug in 1.9) this >>> may >>>> result in data loss. >>>> >>>> DIGY >>>> >>>> -----Original Message----- >>>> From: Robert Stewart [mailto:[email protected]] >>>> Sent: Thursday, June 09, 2011 7:06 PM >>>> To: [email protected] >>>> Subject: [Lucene.Net] index version compatibility (1.9 to 2.9.2)? >>>> >>>> I have a Lucene index created with Lucene.Net 1.9. I have a > multi-segment >>>> index (non-optimized). When I run Lucene.Net 2.9.2 on top of that > index, >>> I >>>> get IndexOutOfRange exceptions in my collectors. It is giving me > document >>>> IDs that are larger than maxDoc. >>>> >>>> My index contains 377831 documents, and IndexReader.MaxDoc() is > returning >>>> 377831, but I get documents from Collect() with large values (for > instance >>>> 379018). Is an index built with Lucene.Net 1.9 compatible with 2.9.2? > If >>>> not, is there some way I can convert it (in production we have many >>> indexes >>>> containing about 200 million docs so I'd rather convert existing indexes >>>> than rebuilt them). >>>> >>>> Thanks >>>> Bob= >>>> >>> >> >
