Thank you very much, all, for your quick responses! I've been busy with other 
stuff recently, but will resume my attempts to generate a lucene PR soon.

In the meantime, our early experiments seem to suggest that with my proposed 
change, we can improve performance of one of our Search apps by 20..25 per 
cent. So at least for our company this optimization is very worthwhile.

Misha

________________________________
From: Dawid Weiss <[email protected]>
Sent: Friday, January 23, 2026 1:55 AM
To: [email protected] <[email protected]>
Cc: Misha Dmitriev <[email protected]>
Subject: Re: A deficiency in lucene code that affects memory footprint and GC


Hi Misha,

Using a github fork and pull request is by far the best way to do it because it 
allows automated checks to be run, easier reviews, etc. It is also a good 
learning exercise to get up to speed with working with multiple remotes in git, 
if you're willing to invest some time in it (you can have both the upstream 
lucene repo and your own fork as two remotes in the same local git clone).

Dawid

On Fri, Jan 23, 2026 at 2:30 AM Misha Dmitriev via java-user 
<[email protected]<mailto:[email protected]>> wrote:
Hi again David,

Forgive my ignorance, since I've never contributed to lucene or apache before. 
I created a git branch based on lucene main branch locally, made, build, 
checked and commited my fix, and then tried to create a PR by pushing that 
branch, see below. Unfortunately, I get an error. I used a classic PAT as a 
password, so the problem seems to be not with password itself, but with not 
having some "access rights". Could you please take a look? I am using my github 
login countmdm, email [email protected]<mailto:[email protected]>

Misha


$ git push -u origin optimize-STEF-main
Username for 'https://github.com<https://github.com/>': countmdm
Password for 'https://[email protected]<https://[email protected]/>':
remote: Permission to apache/lucene.git denied to countmdm.
fatal: unable to access 'https://github.com/apache/lucene.git/': The requested 
URL returned error: 403

________________________________
From: Dawid Weiss <[email protected]<mailto:[email protected]>>
Sent: Wednesday, January 21, 2026 10:21 PM
To: [email protected]<mailto:[email protected]> 
<[email protected]<mailto:[email protected]>>
Cc: Misha Dmitriev <[email protected]<mailto:[email protected]>>
Subject: Re: A deficiency in lucene code that affects memory footprint and GC


Hi Misha,

Please provide a pull request. Small, isolated improvements are easier to 
review and parse by us than large changes but all are welcome.

Also, a lot of (trained) eyes are looking at this code... very often the 
reports of Lucene not performing well are caused by wrong usage rather than 
problems within the implementation - it would be good to share the entire 
context of when the problem is happening and the context of occurrence.

Dawid

On Wed, Jan 21, 2026 at 11:13 PM Misha Dmitriev via java-user 
<[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>>
 wrote:
Hi Lucene community,

At LinkedIn, we use lucene in some important search apps. We recently found 
some problems with GC and memory footpring in one of them. We took a heap dump 
and analyzed it with JXRay 
(https://jxray.com<https://jxray.com/><https://jxray.com/>). Unfortunately, we 
cannot share the entire jxray analysis due to security restrictions, but we can 
share one excerpt from it, see below. It comes from section 11 of jxray report, 
“Bad Primitive Arrays”, which tells us how much memory is wasted due to empty 
or under-utilized primitive arrays. That section says that nearly 4G of memory 
(25.6% of used heap) is wasted. And it turns out that most of that is due to 
byte[] arrays managed by SegmentTermsEnumFrame class.

[X]

To clarify: from the above screenshot, e.g. 80% of all arrays pointed by 
suffixBytes data field are just empty, i.e. contain only zeroes, which likely 
means that they have never been used. Of the remaining arrays, 3% are 
“trail-0s”, i.e. have more than a half of trailing zero elements, i.e. were 
only partially utilized. So only 17% of these arrays have been utilized more or 
less fully. The same is true for all other byte[] arrays managed by 
SegmentTermsEnumFrame. Note that from other sections of the heap dump it’s 
clear that the majority of these objects are garbage, i.e. they have already 
been used and discarded. Thus, at least 80% of memory that was allocated for 
these byte[] arrays has not been used and was wasted. From separate memory 
allocation profiling, we estimated that these arrays are responsible for 
~2G/sec of memory allocation. If they were allocated lazily rather than 
eagerly, i.e. just before they would be really used, we could potentially 
reduce their memory allocation rate share from 2G/sec to (1 - 0.8)*2 = 0.4 
G/sec.

A switch from eager to lazy allocation of some data structure is usually easy 
to implement. Let’s take a quick look at the source 
code<https://fossies.org/linux/www/lucene-10.3.2-src.tgz/lucene-10.3.2/lucene/backward-codecs/src/java/org/apache/lucene/backward_codecs/lucene90/blocktree/SegmentTermsEnumFrame.java>.
 The suffixBytes array usage has the following pattern:

// Eager construction with hardcoded size
byte[] suffixBytes = new byte[128];

…  // Fast forward to the loadBlock() method
…
if (suffixBytes.length < numSuffixBytes) {
  // If we need to read more than 128 bytes, increase the array…
  // … or more precisely, throw away the old array and allocate another one
  suffixBytes = new byte[ArrayUtil.oversize(numSuffixBytes, 1)];
}

From this code, it’s clear that two negative things can happen:

  1.
suffixBytes may not be used at all (the loadBlock() method may not be called or 
may return early). The memory used by the array will be completely wasted
  2.
If numSuffixBytes happens to be greater than 128, the current eagerly allocated 
array will be discarded. The memory used by it will be wasted.

And as our heap dump illustrates, these things likely happen very often. To 
address this problem, it would be sufficient to change the code as follows:

// Avoid eager construction
byte[] suffixBytes;
…
if (suffixByte == null || suffixBytes.length < numSuffixBytes) {
  // If we need to read more than 128 bytes, increase the array…
  // … or more precisely, throw away the old array and allocate another one
  suffixBytes = new byte[ArrayUtil.oversize(numSuffixBytes, 1)];
}

Note that reducing memory allocation rate results primarily in reduction of CPU 
usage and/or improved latency. That’s because each object allocation requires 
work from the JVM - updating pointers and setting all object bytes to zero. And 
then GCing these objects is also CPU-intensive, and results in pausing app 
threads, which affects latency. However, once memory allocation rate is 
reduced, it may be possible to also reduce the JVM heap memory. So the ultimate 
win is going to be in both CPU and memory.

Please let us know how we can proceed with this. The proposed change is 
trivial, and thus maybe it can be done quickly by some established Lucene 
contributor. If not, I guess I can make it myself and then hope that it goes 
through review and release in reasonable time.

Misha

Reply via email to