[jira] [Updated] (SOLR-16515) Remove synchronized access to cachedOrdMaps in SlowCompositeReaderWrapper

Jira Tue, 01 Nov 2022 07:17:05 -0700


     [ 
https://issues.apache.org/jira/browse/SOLR-16515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Torsten Bøgh Köster updated SOLR-16515:
---------------------------------------
    Description: 
The  {{SlowCompositeReaderWrapper}}  uses synchronized read and write access to 
its internal  {{cachedOrdMaps}} . By using a  {{ConcurrentHashMap}}  instead of 
a  {{LinkedHashMap}}  as the  underlying  {{cachedOrdMaps}}  implementation and 
the  {{ConcurrentHashMap#computeIfAbsent}}  method to compute cache values, we 
were able to reduce locking contention significantly.

h3. Background

Under heavy load we discovered that application halts inside of Solr are 
becoming a serious problem in high traffic environments. Using Java Flight 
Recordings we discovered high accumulated applications halts on the  
{{cachedOrdMaps}}  in  {{SlowCompositeReaderWrapper}} . Without this fix we 
were able to utilize our machines only up to 25% cpu usage. With the fix 
applied, a utilization up to 80% is perfectly doable.

h3. Description

Our Solr instances utilizes the  {{collapse}}  component heavily. The instances 
run with 32 cores and 32gb Java heap on a rather small index (4gb). The 
instances scale out at 50% cpu load. We take Java Flight Recorder snapshots of 
60 seconds
as soon the cpu usage exceeds 50%.

 !slow-composite-reader-wrapper-before.jpg|width=1024! 

During our 60s Java Flight Recorder snapshot, the ~2k Jetty threads accumulated 
more than 16h locking time inside the  {{SlowCompositeReaderWrapper}}  (see 
screenshot). With this fix applied, the locking access is reduced to cache 
write accesses only. We validated this using another JFR snapshot:

 !slow-composite-reader-wrapper-after.jpg|width=1024! 

h3. Solution

We propose the following improvement inside the  {{SlowCompositeReaderWrapper}} 
 removing blocking  {{synchronized}}  access to the internal  {{cachedOrdMaps}} 
. The implementation keeps the semantics of the  {{getSortedDocValues}}  and  
{{getSortedSetDocValues}}  methods but moves the expensive part of  
{{OrdinalMap#build}}  into a producer. We use the producer to access the  
{{ConcurrentHashMap}}  using the  {{ConcurrentHashMap#computeIfAbsent}}  method 
only.
The current implementation uses the  {{synchronized}}  block not only to lock 
access to the  {{cachedOrdMaps}}  but also to protect the critical section 
between getting, building and putting the  {{OrdinalMap}}  into the cache. 
Inside the critical section the decision is formed, whether a cacheable value 
should be composed and added to the cache. 
To support non-blocking read access to the cache, we move the building part of 
the critical section into a producer  {{Function}} . The check whether we have 
a cacheable value is made upfront. To properly make that decision we had to 
take logic from  {{MultiDocValues#getSortedSetValues}}  and  
{{MultiDocValues#getSortedValues}}  (the  {{SlowCompositeReaderWrapper}}  
already contained duplicated code from those methods).

h3. Summary

This change removes most blocking access inside the  
{{SlowCompositeReaderWrapper}}  and despite it&#39;s name it&#39;s now capable 
of a much higher request throughput.
This change has been composed together by Dennis Berger, Torsten Bøgh Köster 
and Marco Petris.

  was:
The  {{SlowCompositeReaderWrapper}}  uses synchronized read and write access to 
its internal  {{cachedOrdMaps}} . By using a  {{ConcurrentHashMap}}  instead of 
a  {{LinkedHashMap}}  as the  underlying  {{cachedOrdMaps}}  implementation and 
the  {{ConcurrentHashMap#computeIfAbsent}}  method to compute cache values, we 
were able to reduce locking contention significantly.

h3. Background

Under heavy load we discovered that application halts inside of Solr are 
becoming a serious problem in high traffic environments. Using Java Flight 
Recordings we discovered high accumulated applications halts on the  
{{cachedOrdMaps}}  in  {{SlowCompositeReaderWrapper}} . Without this fix we 
were able to utilize our machines only up to 25% cpu usage. With the fix 
applied, a utilization up to 80% is perfectly doable.

h3. Description

Our Solr instances utilizes the  {{collapse}}  component heavily. The instances 
run with 32 cores and 32gb Java heap on a rather small index (4gb). The 
instances scale out at 50% cpu load. We take Java Flight Recorder snapshots of 
60 seconds
as soon the cpu usage exceeds 50%.

 !slow-composite-reader-wrapper-before.jpg|height=1024px! 

During our 60s Java Flight Recorder snapshot, the ~2k Jetty threads accumulated 
more than 16h locking time inside the  {{SlowCompositeReaderWrapper}}  (see 
screenshot). With this fix applied, the locking access is reduced to cache 
write accesses only. We validated this using another JFR snapshot:

 !slow-composite-reader-wrapper-after.jpg|height=1024px! 

h3. Solution

We propose the following improvement inside the  {{SlowCompositeReaderWrapper}} 
 removing blocking  {{synchronized}}  access to the internal  {{cachedOrdMaps}} 
. The implementation keeps the semantics of the  {{getSortedDocValues}}  and  
{{getSortedSetDocValues}}  methods but moves the expensive part of  
{{OrdinalMap#build}}  into a producer. We use the producer to access the  
{{ConcurrentHashMap}}  using the  {{ConcurrentHashMap#computeIfAbsent}}  method 
only.
The current implementation uses the  {{synchronized}}  block not only to lock 
access to the  {{cachedOrdMaps}}  but also to protect the critical section 
between getting, building and putting the  {{OrdinalMap}}  into the cache. 
Inside the critical section the decision is formed, whether a cacheable value 
should be composed and added to the cache. 
To support non-blocking read access to the cache, we move the building part of 
the critical section into a producer  {{Function}} . The check whether we have 
a cacheable value is made upfront. To properly make that decision we had to 
take logic from  {{MultiDocValues#getSortedSetValues}}  and  
{{MultiDocValues#getSortedValues}}  (the  {{SlowCompositeReaderWrapper}}  
already contained duplicated code from those methods).

h3. Summary

This change removes most blocking access inside the  
{{SlowCompositeReaderWrapper}}  and despite it&#39;s name it&#39;s now capable 
of a much higher request throughput.
This change has been composed together by Dennis Berger, Torsten Bøgh Köster 
and Marco Petris.


> Remove synchronized access to cachedOrdMaps in SlowCompositeReaderWrapper
> -------------------------------------------------------------------------
>
>                 Key: SOLR-16515
>                 URL: https://issues.apache.org/jira/browse/SOLR-16515
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: search
>    Affects Versions: 9.0, 8.11.2
>            Reporter: Torsten Bøgh Köster
>            Priority: Major
>         Attachments: slow-composite-reader-wrapper-after.jpg, 
> slow-composite-reader-wrapper-before.jpg
>
>
> The  {{SlowCompositeReaderWrapper}}  uses synchronized read and write access 
> to its internal  {{cachedOrdMaps}} . By using a  {{ConcurrentHashMap}}  
> instead of a  {{LinkedHashMap}}  as the  underlying  {{cachedOrdMaps}}  
> implementation and the  {{ConcurrentHashMap#computeIfAbsent}}  method to 
> compute cache values, we were able to reduce locking contention significantly.
> h3. Background
> Under heavy load we discovered that application halts inside of Solr are 
> becoming a serious problem in high traffic environments. Using Java Flight 
> Recordings we discovered high accumulated applications halts on the  
> {{cachedOrdMaps}}  in  {{SlowCompositeReaderWrapper}} . Without this fix we 
> were able to utilize our machines only up to 25% cpu usage. With the fix 
> applied, a utilization up to 80% is perfectly doable.
> h3. Description
> Our Solr instances utilizes the  {{collapse}}  component heavily. The 
> instances run with 32 cores and 32gb Java heap on a rather small index (4gb). 
> The instances scale out at 50% cpu load. We take Java Flight Recorder 
> snapshots of 60 seconds
> as soon the cpu usage exceeds 50%.
>  !slow-composite-reader-wrapper-before.jpg|width=1024! 
> During our 60s Java Flight Recorder snapshot, the ~2k Jetty threads 
> accumulated more than 16h locking time inside the  
> {{SlowCompositeReaderWrapper}}  (see screenshot). With this fix applied, the 
> locking access is reduced to cache write accesses only. We validated this 
> using another JFR snapshot:
>  !slow-composite-reader-wrapper-after.jpg|width=1024! 
> h3. Solution
> We propose the following improvement inside the  
> {{SlowCompositeReaderWrapper}}  removing blocking  {{synchronized}}  access 
> to the internal  {{cachedOrdMaps}} . The implementation keeps the semantics 
> of the  {{getSortedDocValues}}  and  {{getSortedSetDocValues}}  methods but 
> moves the expensive part of  {{OrdinalMap#build}}  into a producer. We use 
> the producer to access the  {{ConcurrentHashMap}}  using the  
> {{ConcurrentHashMap#computeIfAbsent}}  method only.
> The current implementation uses the  {{synchronized}}  block not only to lock 
> access to the  {{cachedOrdMaps}}  but also to protect the critical section 
> between getting, building and putting the  {{OrdinalMap}}  into the cache. 
> Inside the critical section the decision is formed, whether a cacheable value 
> should be composed and added to the cache. 
> To support non-blocking read access to the cache, we move the building part 
> of the critical section into a producer  {{Function}} . The check whether we 
> have a cacheable value is made upfront. To properly make that decision we had 
> to take logic from  {{MultiDocValues#getSortedSetValues}}  and  
> {{MultiDocValues#getSortedValues}}  (the  {{SlowCompositeReaderWrapper}}  
> already contained duplicated code from those methods).
> h3. Summary
> This change removes most blocking access inside the  
> {{SlowCompositeReaderWrapper}}  and despite it&#39;s name it&#39;s now 
> capable of a much higher request throughput.
> This change has been composed together by Dennis Berger, Torsten Bøgh Köster 
> and Marco Petris.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SOLR-16515) Remove synchronized access to cachedOrdMaps in SlowCompositeReaderWrapper

Reply via email to