Hi Adrien, Please correct if I am wrong, but I believe using extended IntComparator in custom Sort object for randomization would still score documents (using IndexSearcher.search(Query, int, Sort), for example).
So I tried using a custom collector using IndexSearcher.search(Query, Collector) where the custom collector does not score documents at all. I have refactored RandomOrderCollector to fix the memory usage problem as described below. Let me know if this looks ok now. class RandomOrderCollector extends SimpleCollector { private int maxHitsRequired; private int docBase; private ScoreDoc[] matches; private int numHits; private Random random = new Random(); public RandomOrderCollector(int maxHitsRequired) { this.maxHitsRequired = maxHitsRequired; this.matches = new ScoreDoc[maxHitsRequired]; } public boolean needsScores() { return false; } @Override public void collect(int doc) throws IOException { int absoluteDoc = docBase + doc; int randomScore = random.nextInt(); // assign a random score to each doc if(numHits < maxHitsRequired) { matches[numHits++] = new ScoreDoc(absoluteDoc, randomScore); } else { int index = random.nextInt(maxHitsRequired); if(matches[index].score < randomScore) { matches[index] = new ScoreDoc(absoluteDoc, randomScore);; } } } @Override protected void doSetNextReader(LeafReaderContext context) throws IOException { super.doSetNextReader(context); this.docBase = context.docBase; } public ScoreDoc[] getHits() { return matches; } } Best Regards, Atul Bisaria -----Original Message----- From: Adrien Grand [mailto:jpou...@gmail.com] Sent: Thursday, February 01, 2018 6:11 PM To: java-user@lucene.apache.org Subject: Re: Increase search performance Yes, this collector won't perform well if you have many matches since memory usage is linear with the number of matches. A better option would be to extend eg. IntComparator and implement getNumericDocValues by returning a fake NumericDocValues instance that eg. does a bit mix of the doc id and a per-request seed (for instance HPPC's BitMixer can do that https://github.com/carrotsearch/hppc/blob/master/hppc/src/main/java/com/carrotsearch/hppc/BitMixer.java ). Le jeu. 1 févr. 2018 à 12:31, Atul Bisaria <atul.bisa...@ericsson.com> a écrit : > Hi Adrien, > > Thanks for your reply. > > I have also tried testing with UsageTrackingQueryCachingPolicy, but > did not observe a significant change in both latency and throughput. > > Given that I have specific search requirements of no scoring and > sorting the search results in a random order (reason for custom sort > object), I have also explored writing a custom collector and could > observe quite a difference in latency figures. > > Let me know if this custom collector code has any loopholes which I > could be missing: > > class RandomOrderCollector extends SimpleCollector { > private int maxHitsRequired; > private int docBase; > > private List<Integer> matches = new ArrayList<Integer>(); > > public RandomOrderCollector(int maxHitsRequired) > { > this.maxHitsRequired = maxHitsRequired; > } > > public boolean needsScores() > { > return false; > } > > @Override > public void collect(int doc) throws IOException > { > matches.add(docBase + doc); > } > > @Override > protected void doSetNextReader(LeafReaderContext context) > throws IOException > { > super.doSetNextReader(context); > this.docBase = context.docBase; > } > > public List<Integer> getHits() > { > Collections.shuffle(matches); > maxHitsRequired = Math.min(matches.size(), > maxHitsRequired); > > return matches.subList(0, maxHitsRequired); > } > } > > Best Regards, > Atul Bisaria > > -----Original Message----- > From: Adrien Grand [mailto:jpou...@gmail.com] > Sent: Wednesday, January 31, 2018 6:33 PM > To: java-user@lucene.apache.org > Subject: Re: Increase search performance > > Hi Atul, > > > Le mar. 30 janv. 2018 à 16:24, Atul Bisaria > <atul.bisa...@ericsson.com> a écrit : > > > 1. Using ConstantScoreQuery so that scoring overhead is removed since > > scoring is not required in my search use case. I also use a custom > > Sort object which does not sort by score (see code below). > > > > If you don't sort by score, then wrapping with a ConstantScoreQuery > won't help as Lucene will figure out scores are not needed anyway. > > > > 2. Using query cache > > > > > > > > My understanding is that query cache would cache query results and > > hence lead to significant increase in performance. Is this > > understanding > correct? > > > > It depends what you mean by performance. If you are optimizing for > worst-case latency, then the query cache might make things worse due > to the fact that caching a query requires to visit all matches, while > query execution can sometimes just skip over non-interesting matches > (eg. in conjunctions). > > However if you are looking at improving throughput, then usually the > default policy of the query cache of caching queries that look reused > usually helps. > > > > I am using Lucene version 5.4.1 where query cache seems to be > > enabled by default > > (https://issues.apache.org/jira/browse/LUCENE-6784), but I am not able to > > see any significant change in search performance. > > > > > > > > Here is the code I am testing with: > > > > > > > > DirectoryReader reader = DirectoryReader.open(directory); //using > > MMapDirectory > > > > IndexSearcher searcher = new IndexSearcher(reader); //IndexReader > > and IndexSearcher are created only once > > > > searcher.setQueryCachingPolicy(QueryCachingPolicy.ALWAYS_CACHE); > > > > Don't do that, this will always cache all filters, which usually makes > things slower for the reason mentioned above. I would rather advise > that you use an instance of UsageTrackingQueryCachingPolicy. >