Re: org.apache.lucene.search.BooleanWeight.bulkScorer() and BulkScorer.score()

Adrien Grand Tue, 05 Oct 2021 12:18:45 -0700

Hmm we should fix these access$ accessors by fixing the visibility of some
fields.


These breakdowns do not necessarily signal that something is wrong. Is the
query executing fast overall?

On Mon, Oct 4, 2021 at 11:57 PM Baris Kazar <[email protected]> wrote:

> Hi, -
> I did more experiments and this time i looked into these methods:
> org.apache.lucene.search.BooleanWeight.bulkScorer() and BulkScorer.score()
>
>
> Lets start with BooleanWeight.bulkScorer() with its call tree and time
> spent:
>
>
> BooleanWeight.bulkScorer()
> -->> Weight.bulkScorer()
> -->>-->> BooleanWeight.scorer()
> -->>-->>-->>BooleanWeight.scorerSupplier()
> -->>-->>-->>-->> Weight.scorerSupplier()
> -->>-->>-->>-->>-->> TermQuery$Termweight.scorer()
> -->>-->>-->>-->>-->>-->>
> org.apache.lucene.codecs.blocktree.SegmentTermsEnum.impacts()
> -->>-->>-->>-->>-->>-->>-->>
> org.apache.lucene.codecs.lucene84.Lucene84PostingsReader.impacts()
> -->>-->>-->>-->>-->>-->>-->>-->>
> org.apache.lucene.codecs.lucene84.Lucene84PostingsReader$BlockImpactsDocEnums.init()
> -->>-->>-->>-->>-->>-->>-->>-->>-->>
> org.apache.lucene.codecs.lucene84.Lucene84SkipReader.init()
> -->>-->>-->>-->>-->>-->>-->>-->>-->>-->>
> org.apache.lucene.codecs.MultiLevelSkipListReader.init()
> -->>-->>-->>-->>-->>-->>-->>-->>-->>-->>-->>
> org.apache.lucene.codecs.MultiLevelSkipListReader.loadSkipLevels()
> -->>-->>-->>-->>-->>-->>-->>-->>-->>-->>-->>-->>
> org.apache.lucene.store.DataInput.readVLong() (constittutes %100 of
> BooleanWeight.bulkScorer() time here)
>
>
>
> Next: BulkScorer.score() with its call tree and time spent:
>
>
>
> BulkScorer.score()
> -->> Weight$DefaultBulkScorer.score()
> -->>-->> Weight$DefaultBulkScorer.scoreAll()
> -->>-->>-->> WANDScorer$1.nextDoc()
> -->>-->>-->>-->> WANDScorer$1.advance()
> -->>-->>-->>-->>-->> WANDScorer.access$300() (constitutes %65 of
> BulkScorer.score() time here)
> -->>-->>-->>-->>-->> WANDScorer.access$100() (constitutes %30 of
> BulkScorer.score() time here)
> -->>-->>-->>-->>-->> WANDScorer.access$400() (constitutes %5 of
> BulkScorer.score() time here)
>
> Best regards
>
> ________________________________
> From: Baris Kazar <[email protected]>
> Sent: Saturday, October 2, 2021 3:14 PM
> To: Adrien Grand <[email protected]>; Lucene Users Mailing List <
> [email protected]>
> Cc: Baris Kazar <[email protected]>
> Subject: Re: org.apache.lucene.search.BooleanWeight.bulkScorer() and
> BulkScorer.score()
>
> Hi Adrien,-
> Thanks. Let me see next week the components (units, methods) within
> BulkScorer#score to see what takes most time among its called methods.
>
> Jvisualvm reports for a method whole time including the time spent in the
> called methods and when you go down the execution tree it goes until the
> very last called method.
>
> Regarding the second paragraph above:
> when will there be too many segments in the Lucene index? i have 1 text
> field and 1 stored (non indexed) field.
>
> I most of the time get a couple of thousands hits and i ask for top 20 of
> them. Could this be leading to
> BooleanWeight#bulkScorer spending time?
>
> Both of these units:
> BooleanWeight#bulkScorer and BulkScorer#score spend equal amounts of time
> and totally make up
> 75% of IndexSearcher#search as i mentioned before.
>
> Thanks for the swift reply
> I appreciate very much
>
>
> Best regards
> ________________________________
> From: Adrien Grand <[email protected]>
> Sent: Saturday, October 2, 2021 1:44:40 AM
> To: Lucene Users Mailing List <[email protected]>
> Cc: Baris Kazar <[email protected]>
> Subject: Re: org.apache.lucene.search.BooleanWeight.bulkScorer() and
> BulkScorer.score()
>
> Is your profiler reporting inclusive or exclusive costs for each function?
> Ie. does it exclude time spent in functions that are called within a
> function? I'm asking because it makes total sense for IndexSearcher#search
> to spend most of its time is BulkScorer#score, which coordinates the whole
> matching+scoring process.
>
> Having much time spent in BooleanWeight#bulkScorer is a bit surprising
> however. This suggests that you have too many segments in your index (since
> the bulk scorer needs to be recreated for every segment) or that your
> average query matches a very low number of documents (so that Lucene spends
> more time figuring out how best to find the matches versus actually finding
> these matches).
>
> On Sat, Oct 2, 2021 at 5:57 AM Baris Kazar <[email protected]<mailto:
> [email protected]>> wrote:
> Hi,-
>  I performance profiled my application via jvisualvm on Java
> and saw that 75% of the search process from
> org.apache.lucene.search.IndexSearcher.search() are spent on
> these units:
> org.apache.lucene.search.BooleanWeight.bulkScorer() and BulkScorer.score()
> Is there any study or project to speed up these please?
>
> Best regards
>
>
>
> --
> Adrien
>


-- 
Adrien

Re: org.apache.lucene.search.BooleanWeight.bulkScorer() and BulkScorer.score()

Reply via email to