[
https://issues.apache.org/jira/browse/SOLR-16555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17640070#comment-17640070
]
ASF subversion and git services commented on SOLR-16555:
--------------------------------------------------------
Commit 1a964804025f7fddf31cbd54b51e2619decbcae2 in lucene-solr's branch
refs/heads/branch_8_11 from Kevin Risden
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=1a964804025 ]
SOLR-16555: SolrIndexSearcher - FilterCache intersections/andNot should not
clone bitsets repeatedly (#1184) (#2675)
Co-authored-by: David Smiley <[email protected]>
> SolrIndexSearcher - FilterCache intersections/andNot should not clone bitsets
> repeatedly
> ----------------------------------------------------------------------------------------
>
> Key: SOLR-16555
> URL: https://issues.apache.org/jira/browse/SOLR-16555
> Project: Solr
> Issue Type: Improvement
> Security Level: Public(Default Security Level. Issues are Public)
> Components: query
> Reporter: Kevin Risden
> Assignee: Kevin Risden
> Priority: Major
> Labels: performance
> Fix For: main (10.0)
>
> Attachments: Screenshot 2022-11-16 at 14.52.37.png, Screenshot
> 2022-11-16 at 14.53.23.png, Screenshot 2022-11-16 at 14.53.35.png, Screenshot
> 2022-11-17 at 13.03.21.png, Screenshot 2022-11-17 at 13.25.57.png, Screenshot
> 2022-11-17 at 13.28.06.png
>
> Time Spent: 7h 50m
> Remaining Estimate: 0h
>
> SolrIndexSearcher takes the bitset from the result and tries to combine it
> with all the cached filter queries. Currently this duplicates the bitset
> multiple times based on the number of filter queries. It looks like this
> isn't necessary and instead could just operate on the bitset itself or a
> single mutable copy of the bitset.
> Lines 1219 to 1225
> https://github.com/apache/solr/blob/main/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java#L1219
> ----
> I've been using async profiler
> (https://github.com/jvm-profiling-tools/async-profiler) to look at some
> performance stuff with Solr for a client. Originally I looked at CPU in the
> profile and found that I could also capture and look at memory allocations
> during the same run. This led me to finding this crazy amount of memory
> allocation over a short period of time.
> Async profiler is being run with the following parameters which captures cpu,
> memory, and lock information for a 300 second period on some pid.
> {code:java}
> /opt/async-profiler/profiler.sh -a -d 300 -o jfr -e cpu,alloc,lock -f
> /tmp/profile.jfr PID_GOES_HERE
> {code}
> The resulting JFR for this is ~100-200MB usually and so not going to attach
> it here since it has some client specific methods in some calls in it.
> However screenshots of the findings from loading the jfr in both IntelliJ and
> Java Mission Control you can see some of the findings:
> !Screenshot 2022-11-17 at 13.28.06.png|width=750!
> The memory allocated from SolrIndexSearcher#getProcessedFilter is ~60% of
> total memory allocated during the 5 minute profile period.
> !Screenshot 2022-11-16 at 14.52.37.png|width=750!
> This shows that in 5 minutes ~1TB (yes thats TB=terabyte or 1000GB) of memory
> allocations for SolrIndexSearcher#getProcessedFilter
> !Screenshot 2022-11-16 at 14.53.23.png|width=750!
> ~680GB was allocated from BitDocSet#intersection
> !Screenshot 2022-11-16 at 14.53.35.png|width=750!
> ~315GB was allocated from BitDocSet#andNot
> Based on CPU profiling, it is amazing to me but G1 garbage collector is
> keeping up. Each of these objects are very short lived.
> This was during some load testing and able to give some query types in
> question:
> * ~30 queries/second
> * ~5 fq parameters per query
> * so ~9000 queries in 5 minutes with ~45000 fq clauses
> * 10GB heap for the Solr instance with 128GB ram on the node and index size
> completely fits in memory.
> * this is one shard on the node for testing and ~23 million documents in the
> shard - optimized so no deletes.
> * This was tested with Solr 8.8, but as far as I can tell the code has not
> changed for the main branch significantly (slight refactoring of class
> hierachy after SOLR-14256 but didn't affect intersection/andNot).
> Based on my rough calculations, that is ~24MB of heap per filter query clause
> (1.06TB/45000) or ~120MB of heap per query (assuming 5 fq per query).
> ----
> I loaded the same JFR up in Java Mission Control to see if there were other
> insights and found TLAB memory allocation details.
> !Screenshot 2022-11-17 at 13.25.57.png|width=750!
> Since most of these are large allocations, Java mission control is very
> helpful in saying that there are a large number of allocations outside of
> TLAB (~80%) and that you probably shouldn't do that.
> !Screenshot 2022-11-17 at 13.03.21.png|width=750!
> From what I understand about allocations outside of TLAB is that they should
> in theory be a small portion of allocations, but they aren't...
> * https://shipilev.net/jvm/anatomy-quarks/4-tlab-allocation/
> * https://www.opsian.com/blog/jvm-tlabs-important-multicore/
> *
> https://stackoverflow.com/questions/26351243/allocations-in-new-tlab-vs-allocations-outside-tlab
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]