Yes, this collector won't perform well if you have many matches since
memory usage is linear with the number of matches. A better option would be
to extend eg. IntComparator and implement getNumericDocValues by returning
a fake NumericDocValues instance that eg. does a bit mix of the doc id and
a per-request seed (for instance HPPC's BitMixer can do that
https://github.com/carrotsearch/hppc/blob/master/hppc/src/main/java/com/carrotsearch/hppc/BitMixer.java
).

Le jeu. 1 févr. 2018 à 12:31, Atul Bisaria <atul.bisa...@ericsson.com> a
écrit :

> Hi Adrien,
>
> Thanks for your reply.
>
> I have also tried testing with UsageTrackingQueryCachingPolicy, but did
> not observe a significant change in both latency and throughput.
>
> Given that I have specific search requirements of no scoring and sorting
> the search results in a random order (reason for custom sort object), I
> have also explored writing a custom collector and could observe quite a
> difference in latency figures.
>
> Let me know if this custom collector code has any loopholes which I could
> be missing:
>
> class RandomOrderCollector extends SimpleCollector
> {
>         private int maxHitsRequired;
>         private int docBase;
>
>         private List<Integer> matches = new ArrayList<Integer>();
>
>         public RandomOrderCollector(int maxHitsRequired)
>         {
>                 this.maxHitsRequired = maxHitsRequired;
>         }
>
>         public boolean needsScores()
>         {
>                 return false;
>         }
>
>         @Override
>         public void collect(int doc) throws IOException
>         {
>                 matches.add(docBase + doc);
>         }
>
>         @Override
>         protected void doSetNextReader(LeafReaderContext context) throws
> IOException
>         {
>                 super.doSetNextReader(context);
>                 this.docBase = context.docBase;
>         }
>
>         public List<Integer> getHits()
>         {
>                 Collections.shuffle(matches);
>                 maxHitsRequired = Math.min(matches.size(),
> maxHitsRequired);
>
>                 return matches.subList(0, maxHitsRequired);
>         }
> }
>
> Best Regards,
> Atul Bisaria
>
> -----Original Message-----
> From: Adrien Grand [mailto:jpou...@gmail.com]
> Sent: Wednesday, January 31, 2018 6:33 PM
> To: java-user@lucene.apache.org
> Subject: Re: Increase search performance
>
> Hi Atul,
>
>
> Le mar. 30 janv. 2018 à 16:24, Atul Bisaria <atul.bisa...@ericsson.com> a
> écrit :
>
> > 1.     Using ConstantScoreQuery so that scoring overhead is removed since
> > scoring is not required in my search use case. I also use a custom
> > Sort object which does not sort by score (see code below).
> >
>
> If you don't sort by score, then wrapping with a ConstantScoreQuery won't
> help as Lucene will figure out scores are not needed anyway.
>
>
> > 2.     Using query cache
> >
> >
> >
> > My understanding is that query cache would cache query results and
> > hence lead to significant increase in performance. Is this understanding
> correct?
> >
>
> It depends what you mean by performance. If you are optimizing for
> worst-case latency, then the query cache might make things worse due to the
> fact that caching a query requires to visit all matches, while query
> execution can sometimes just skip over non-interesting matches (eg. in
> conjunctions).
>
> However if you are looking at improving throughput, then usually the
> default policy of the query cache of caching queries that look reused
> usually helps.
>
>
> > I am using Lucene version 5.4.1 where query cache seems to be enabled
> > by default (https://issues.apache.org/jira/browse/LUCENE-6784), but I
> > am not able to see any significant change in search performance.
> >
>
>
>
>
> > Here is the code I am testing with:
> >
> >
> >
> > DirectoryReader reader = DirectoryReader.open(directory);      //using
> > MMapDirectory
> >
> > IndexSearcher searcher = new IndexSearcher(reader); //IndexReader and
> > IndexSearcher are created only once
> >
> > searcher.setQueryCachingPolicy(QueryCachingPolicy.ALWAYS_CACHE);
> >
>
> Don't do that, this will always cache all filters, which usually makes
> things slower for the reason mentioned above. I would rather advise that
> you use an instance of UsageTrackingQueryCachingPolicy.
>

Reply via email to