Yes, this collector won't perform well if you have many matches since memory usage is linear with the number of matches. A better option would be to extend eg. IntComparator and implement getNumericDocValues by returning a fake NumericDocValues instance that eg. does a bit mix of the doc id and a per-request seed (for instance HPPC's BitMixer can do that https://github.com/carrotsearch/hppc/blob/master/hppc/src/main/java/com/carrotsearch/hppc/BitMixer.java ).
Le jeu. 1 févr. 2018 à 12:31, Atul Bisaria <atul.bisa...@ericsson.com> a écrit : > Hi Adrien, > > Thanks for your reply. > > I have also tried testing with UsageTrackingQueryCachingPolicy, but did > not observe a significant change in both latency and throughput. > > Given that I have specific search requirements of no scoring and sorting > the search results in a random order (reason for custom sort object), I > have also explored writing a custom collector and could observe quite a > difference in latency figures. > > Let me know if this custom collector code has any loopholes which I could > be missing: > > class RandomOrderCollector extends SimpleCollector > { > private int maxHitsRequired; > private int docBase; > > private List<Integer> matches = new ArrayList<Integer>(); > > public RandomOrderCollector(int maxHitsRequired) > { > this.maxHitsRequired = maxHitsRequired; > } > > public boolean needsScores() > { > return false; > } > > @Override > public void collect(int doc) throws IOException > { > matches.add(docBase + doc); > } > > @Override > protected void doSetNextReader(LeafReaderContext context) throws > IOException > { > super.doSetNextReader(context); > this.docBase = context.docBase; > } > > public List<Integer> getHits() > { > Collections.shuffle(matches); > maxHitsRequired = Math.min(matches.size(), > maxHitsRequired); > > return matches.subList(0, maxHitsRequired); > } > } > > Best Regards, > Atul Bisaria > > -----Original Message----- > From: Adrien Grand [mailto:jpou...@gmail.com] > Sent: Wednesday, January 31, 2018 6:33 PM > To: java-user@lucene.apache.org > Subject: Re: Increase search performance > > Hi Atul, > > > Le mar. 30 janv. 2018 à 16:24, Atul Bisaria <atul.bisa...@ericsson.com> a > écrit : > > > 1. Using ConstantScoreQuery so that scoring overhead is removed since > > scoring is not required in my search use case. I also use a custom > > Sort object which does not sort by score (see code below). > > > > If you don't sort by score, then wrapping with a ConstantScoreQuery won't > help as Lucene will figure out scores are not needed anyway. > > > > 2. Using query cache > > > > > > > > My understanding is that query cache would cache query results and > > hence lead to significant increase in performance. Is this understanding > correct? > > > > It depends what you mean by performance. If you are optimizing for > worst-case latency, then the query cache might make things worse due to the > fact that caching a query requires to visit all matches, while query > execution can sometimes just skip over non-interesting matches (eg. in > conjunctions). > > However if you are looking at improving throughput, then usually the > default policy of the query cache of caching queries that look reused > usually helps. > > > > I am using Lucene version 5.4.1 where query cache seems to be enabled > > by default (https://issues.apache.org/jira/browse/LUCENE-6784), but I > > am not able to see any significant change in search performance. > > > > > > > > Here is the code I am testing with: > > > > > > > > DirectoryReader reader = DirectoryReader.open(directory); //using > > MMapDirectory > > > > IndexSearcher searcher = new IndexSearcher(reader); //IndexReader and > > IndexSearcher are created only once > > > > searcher.setQueryCachingPolicy(QueryCachingPolicy.ALWAYS_CACHE); > > > > Don't do that, this will always cache all filters, which usually makes > things slower for the reason mentioned above. I would rather advise that > you use an instance of UsageTrackingQueryCachingPolicy. >