[
https://issues.apache.org/jira/browse/SOLR-16515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Torsten Bøgh Köster updated SOLR-16515:
---------------------------------------
Description:
The {{SlowCompositeReaderWrapper}} uses synchronized read and write access to
its internal {{cachedOrdMaps}} . By using a {{ConcurrentHashMap}} instead of
a {{LinkedHashMap}} as the underlying {{cachedOrdMaps}} implementation and
the {{ConcurrentHashMap#computeIfAbsent}} method to compute cache values, we
were able to reduce locking contention significantly.
h3. Background
Under heavy load we discovered that application halts inside of Solr are
becoming a serious problem in high traffic environments. Using Java Flight
Recordings we discovered high accumulated applications halts on the
{{cachedOrdMaps}} in {{SlowCompositeReaderWrapper}} . Without this fix we
were able to utilize our machines only up to 25% cpu usage. With the fix
applied, a utilization up to 80% is perfectly doable.
h3. Description
Our Solr instances utilizes the {{collapse}} component heavily. The instances
run with 32 cores and 32gb Java heap on a rather small index (4gb). The
instances scale out at 50% cpu load. We take Java Flight Recorder snapshots of
60 seconds
as soon the cpu usage exceeds 50%.
!slow-composite-reader-wrapper-before.jpg|width=1024!
During our 60s Java Flight Recorder snapshot, the ~2k Jetty threads accumulated
more than 16h locking time inside the {{SlowCompositeReaderWrapper}} (see
screenshot). With this fix applied, the locking access is reduced to cache
write accesses only. We validated this using another JFR snapshot:
!slow-composite-reader-wrapper-after.jpg|width=1024!
h3. Solution
We propose the following improvement inside the {{SlowCompositeReaderWrapper}}
removing blocking {{synchronized}} access to the internal {{cachedOrdMaps}}
. The implementation keeps the semantics of the {{getSortedDocValues}} and
{{getSortedSetDocValues}} methods but moves the expensive part of
{{OrdinalMap#build}} into a producer. We use the producer to access the
{{ConcurrentHashMap}} using the {{ConcurrentHashMap#computeIfAbsent}} method
only.
The current implementation uses the {{synchronized}} block not only to lock
access to the {{cachedOrdMaps}} but also to protect the critical section
between getting, building and putting the {{OrdinalMap}} into the cache.
Inside the critical section the decision is formed, whether a cacheable value
should be composed and added to the cache.
To support non-blocking read access to the cache, we move the building part of
the critical section into a producer {{Function}} . The check whether we have
a cacheable value is made upfront. To properly make that decision we had to
take logic from {{MultiDocValues#getSortedSetValues}} and
{{MultiDocValues#getSortedValues}} (the {{SlowCompositeReaderWrapper}}
already contained duplicated code from those methods).
h3. Summary
This change removes most blocking access inside the
{{SlowCompositeReaderWrapper}} and despite it's name it's now capable
of a much higher request throughput.
This change has been composed together by Dennis Berger, Torsten Bøgh Köster
and Marco Petris.
was:
The {{SlowCompositeReaderWrapper}} uses synchronized read and write access to
its internal {{cachedOrdMaps}} . By using a {{ConcurrentHashMap}} instead of
a {{LinkedHashMap}} as the underlying {{cachedOrdMaps}} implementation and
the {{ConcurrentHashMap#computeIfAbsent}} method to compute cache values, we
were able to reduce locking contention significantly.
h3. Background
Under heavy load we discovered that application halts inside of Solr are
becoming a serious problem in high traffic environments. Using Java Flight
Recordings we discovered high accumulated applications halts on the
{{cachedOrdMaps}} in {{SlowCompositeReaderWrapper}} . Without this fix we
were able to utilize our machines only up to 25% cpu usage. With the fix
applied, a utilization up to 80% is perfectly doable.
h3. Description
Our Solr instances utilizes the {{collapse}} component heavily. The instances
run with 32 cores and 32gb Java heap on a rather small index (4gb). The
instances scale out at 50% cpu load. We take Java Flight Recorder snapshots of
60 seconds
as soon the cpu usage exceeds 50%.
!slow-composite-reader-wrapper-before.jpg|height=1024px!
During our 60s Java Flight Recorder snapshot, the ~2k Jetty threads accumulated
more than 16h locking time inside the {{SlowCompositeReaderWrapper}} (see
screenshot). With this fix applied, the locking access is reduced to cache
write accesses only. We validated this using another JFR snapshot:
!slow-composite-reader-wrapper-after.jpg|height=1024px!
h3. Solution
We propose the following improvement inside the {{SlowCompositeReaderWrapper}}
removing blocking {{synchronized}} access to the internal {{cachedOrdMaps}}
. The implementation keeps the semantics of the {{getSortedDocValues}} and
{{getSortedSetDocValues}} methods but moves the expensive part of
{{OrdinalMap#build}} into a producer. We use the producer to access the
{{ConcurrentHashMap}} using the {{ConcurrentHashMap#computeIfAbsent}} method
only.
The current implementation uses the {{synchronized}} block not only to lock
access to the {{cachedOrdMaps}} but also to protect the critical section
between getting, building and putting the {{OrdinalMap}} into the cache.
Inside the critical section the decision is formed, whether a cacheable value
should be composed and added to the cache.
To support non-blocking read access to the cache, we move the building part of
the critical section into a producer {{Function}} . The check whether we have
a cacheable value is made upfront. To properly make that decision we had to
take logic from {{MultiDocValues#getSortedSetValues}} and
{{MultiDocValues#getSortedValues}} (the {{SlowCompositeReaderWrapper}}
already contained duplicated code from those methods).
h3. Summary
This change removes most blocking access inside the
{{SlowCompositeReaderWrapper}} and despite it's name it's now capable
of a much higher request throughput.
This change has been composed together by Dennis Berger, Torsten Bøgh Köster
and Marco Petris.
> Remove synchronized access to cachedOrdMaps in SlowCompositeReaderWrapper
> -------------------------------------------------------------------------
>
> Key: SOLR-16515
> URL: https://issues.apache.org/jira/browse/SOLR-16515
> Project: Solr
> Issue Type: Improvement
> Security Level: Public(Default Security Level. Issues are Public)
> Components: search
> Affects Versions: 9.0, 8.11.2
> Reporter: Torsten Bøgh Köster
> Priority: Major
> Attachments: slow-composite-reader-wrapper-after.jpg,
> slow-composite-reader-wrapper-before.jpg
>
>
> The {{SlowCompositeReaderWrapper}} uses synchronized read and write access
> to its internal {{cachedOrdMaps}} . By using a {{ConcurrentHashMap}}
> instead of a {{LinkedHashMap}} as the underlying {{cachedOrdMaps}}
> implementation and the {{ConcurrentHashMap#computeIfAbsent}} method to
> compute cache values, we were able to reduce locking contention significantly.
> h3. Background
> Under heavy load we discovered that application halts inside of Solr are
> becoming a serious problem in high traffic environments. Using Java Flight
> Recordings we discovered high accumulated applications halts on the
> {{cachedOrdMaps}} in {{SlowCompositeReaderWrapper}} . Without this fix we
> were able to utilize our machines only up to 25% cpu usage. With the fix
> applied, a utilization up to 80% is perfectly doable.
> h3. Description
> Our Solr instances utilizes the {{collapse}} component heavily. The
> instances run with 32 cores and 32gb Java heap on a rather small index (4gb).
> The instances scale out at 50% cpu load. We take Java Flight Recorder
> snapshots of 60 seconds
> as soon the cpu usage exceeds 50%.
> !slow-composite-reader-wrapper-before.jpg|width=1024!
> During our 60s Java Flight Recorder snapshot, the ~2k Jetty threads
> accumulated more than 16h locking time inside the
> {{SlowCompositeReaderWrapper}} (see screenshot). With this fix applied, the
> locking access is reduced to cache write accesses only. We validated this
> using another JFR snapshot:
> !slow-composite-reader-wrapper-after.jpg|width=1024!
> h3. Solution
> We propose the following improvement inside the
> {{SlowCompositeReaderWrapper}} removing blocking {{synchronized}} access
> to the internal {{cachedOrdMaps}} . The implementation keeps the semantics
> of the {{getSortedDocValues}} and {{getSortedSetDocValues}} methods but
> moves the expensive part of {{OrdinalMap#build}} into a producer. We use
> the producer to access the {{ConcurrentHashMap}} using the
> {{ConcurrentHashMap#computeIfAbsent}} method only.
> The current implementation uses the {{synchronized}} block not only to lock
> access to the {{cachedOrdMaps}} but also to protect the critical section
> between getting, building and putting the {{OrdinalMap}} into the cache.
> Inside the critical section the decision is formed, whether a cacheable value
> should be composed and added to the cache.
> To support non-blocking read access to the cache, we move the building part
> of the critical section into a producer {{Function}} . The check whether we
> have a cacheable value is made upfront. To properly make that decision we had
> to take logic from {{MultiDocValues#getSortedSetValues}} and
> {{MultiDocValues#getSortedValues}} (the {{SlowCompositeReaderWrapper}}
> already contained duplicated code from those methods).
> h3. Summary
> This change removes most blocking access inside the
> {{SlowCompositeReaderWrapper}} and despite it's name it's now
> capable of a much higher request throughput.
> This change has been composed together by Dennis Berger, Torsten Bøgh Köster
> and Marco Petris.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]