countmdm opened a new pull request, #15618:
URL: https://github.com/apache/lucene/pull/15618
### Description
At LinkedIn, we use lucene in some important search apps. We recently
started to look at possible GC and memory footprint optimizations in several of
them. We first analyzed a memory allocation profile and found that around 1/3
of memory allocation, in bytes per second, is due to byte[] arrays managed by
class SegmentTermsEnumFrame ('suffixBytes', 'statBytes' etc). Next, we analyzed
a heap dump and found that of these arrays, around 80% are empty. That is, they
contain only zeroes, which likely signals that they are never used. Of the
remaining arrays, a significant percentage has a lot of trailing zeroes, which
likely means that the default size is too big and these arrays are
underutilized. All in all, we estimated that if the arrays in question were
allocated lazily, the total memory allocation should drop by ~25%. That's a
huge saving, potentially allowing us to run the same apps with much higher QPS.
We made an experimental local change to lucene 9.8.0 that we currently use,
and confirmed that memory allocation rate drops as expected, after which QPS
can be raised significantly without affecting application's latency.
Please note that this PR is in the 'main' branch. If somebody else backports
it to the 9.8.0 and other old branches, they will need to implement a similar
fix for the 'floorData' field that exists only in these branches.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]