Thanks for the feedback! -----Original Message----- From: Adrien Grand [mailto:jpou...@gmail.com] Sent: Friday, February 02, 2018 1:42 PM To: java-user@lucene.apache.org Subject: Re: Increase search performance
If needsScores returns false on the collector, then scores won't be computed. Your prototype should work well. Le ven. 2 févr. 2018 à 04:46, Atul Bisaria <atul.bisa...@ericsson.com> a écrit : > Hi Adrien, > > Please correct if I am wrong, but I believe using extended > IntComparator in custom Sort object for randomization would still > score documents (using IndexSearcher.search(Query, int, Sort), for example). > > So I tried using a custom collector using IndexSearcher.search(Query, > Collector) where the custom collector does not score documents at all. > > I have refactored RandomOrderCollector to fix the memory usage problem > as described below. Let me know if this looks ok now. > > class RandomOrderCollector extends SimpleCollector { > private int maxHitsRequired; > private int docBase; > > private ScoreDoc[] matches; > > private int numHits; > > private Random random = new Random(); > > public RandomOrderCollector(int maxHitsRequired) > { > this.maxHitsRequired = maxHitsRequired; > this.matches = new ScoreDoc[maxHitsRequired]; > } > > public boolean needsScores() > { > return false; > } > > @Override > public void collect(int doc) throws IOException > { > int absoluteDoc = docBase + doc; > int randomScore = random.nextInt(); // assign a random > score to each doc > > if(numHits < maxHitsRequired) > { > matches[numHits++] = new ScoreDoc(absoluteDoc, > randomScore); > } > else > { > int index = random.nextInt(maxHitsRequired); > if(matches[index].score < randomScore) > { > matches[index] = new > ScoreDoc(absoluteDoc, randomScore);; > } > } > } > > @Override > protected void doSetNextReader(LeafReaderContext context) > throws IOException > { > super.doSetNextReader(context); > this.docBase = context.docBase; > } > > public ScoreDoc[] getHits() > { > return matches; > } > } > > Best Regards, > Atul Bisaria > > -----Original Message----- > From: Adrien Grand [mailto:jpou...@gmail.com] > Sent: Thursday, February 01, 2018 6:11 PM > To: java-user@lucene.apache.org > Subject: Re: Increase search performance > > Yes, this collector won't perform well if you have many matches since > memory usage is linear with the number of matches. A better option > would be to extend eg. IntComparator and implement getNumericDocValues > by returning a fake NumericDocValues instance that eg. does a bit mix > of the doc id and a per-request seed (for instance HPPC's BitMixer can > do that > https://github.com/carrotsearch/hppc/blob/master/hppc/src/main/java/co > m/carrotsearch/hppc/BitMixer.java > ). > > Le jeu. 1 févr. 2018 à 12:31, Atul Bisaria <atul.bisa...@ericsson.com> > a écrit : > > > Hi Adrien, > > > > Thanks for your reply. > > > > I have also tried testing with UsageTrackingQueryCachingPolicy, but > > did not observe a significant change in both latency and throughput. > > > > Given that I have specific search requirements of no scoring and > > sorting the search results in a random order (reason for custom sort > > object), I have also explored writing a custom collector and could > > observe quite a difference in latency figures. > > > > Let me know if this custom collector code has any loopholes which I > > could be missing: > > > > class RandomOrderCollector extends SimpleCollector { > > private int maxHitsRequired; > > private int docBase; > > > > private List<Integer> matches = new ArrayList<Integer>(); > > > > public RandomOrderCollector(int maxHitsRequired) > > { > > this.maxHitsRequired = maxHitsRequired; > > } > > > > public boolean needsScores() > > { > > return false; > > } > > > > @Override > > public void collect(int doc) throws IOException > > { > > matches.add(docBase + doc); > > } > > > > @Override > > protected void doSetNextReader(LeafReaderContext context) > > throws IOException > > { > > super.doSetNextReader(context); > > this.docBase = context.docBase; > > } > > > > public List<Integer> getHits() > > { > > Collections.shuffle(matches); > > maxHitsRequired = Math.min(matches.size(), > > maxHitsRequired); > > > > return matches.subList(0, maxHitsRequired); > > } > > } > > > > Best Regards, > > Atul Bisaria > > > > -----Original Message----- > > From: Adrien Grand [mailto:jpou...@gmail.com] > > Sent: Wednesday, January 31, 2018 6:33 PM > > To: java-user@lucene.apache.org > > Subject: Re: Increase search performance > > > > Hi Atul, > > > > > > Le mar. 30 janv. 2018 à 16:24, Atul Bisaria > > <atul.bisa...@ericsson.com> a écrit : > > > > > 1. Using ConstantScoreQuery so that scoring overhead is removed > since > > > scoring is not required in my search use case. I also use a custom > > > Sort object which does not sort by score (see code below). > > > > > > > If you don't sort by score, then wrapping with a ConstantScoreQuery > > won't help as Lucene will figure out scores are not needed anyway. > > > > > > > 2. Using query cache > > > > > > > > > > > > My understanding is that query cache would cache query results and > > > hence lead to significant increase in performance. Is this > > > understanding > > correct? > > > > > > > It depends what you mean by performance. If you are optimizing for > > worst-case latency, then the query cache might make things worse due > > to the fact that caching a query requires to visit all matches, > > while query execution can sometimes just skip over non-interesting > > matches (eg. in conjunctions). > > > > However if you are looking at improving throughput, then usually the > > default policy of the query cache of caching queries that look > > reused usually helps. > > > > > > > I am using Lucene version 5.4.1 where query cache seems to be > > > enabled by default > > > (https://issues.apache.org/jira/browse/LUCENE-6784), but I am not > able to see any significant change in search performance. > > > > > > > > > > > > > > Here is the code I am testing with: > > > > > > > > > > > > DirectoryReader reader = DirectoryReader.open(directory); //using > > > MMapDirectory > > > > > > IndexSearcher searcher = new IndexSearcher(reader); //IndexReader > > > and IndexSearcher are created only once > > > > > > searcher.setQueryCachingPolicy(QueryCachingPolicy.ALWAYS_CACHE); > > > > > > > Don't do that, this will always cache all filters, which usually > > makes things slower for the reason mentioned above. I would rather > > advise that you use an instance of UsageTrackingQueryCachingPolicy. > > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org