Hi,

this could be related to a bug or limitation of the following change:

1. GITHUB#13570
   <https://github.com/apache/lucene/pull/13570>,GITHUB#13574
   <https://github.com/apache/lucene/pull/13574>,GITHUB#13535
   <https://github.com/apache/lucene/pull/13535>: Avoid performance
   degradation with closing shared Arenas. Closing many individual
   index files can potentially lead to a degradation in execution
   performance. Index files are mmapped one-to-one with the JDK's
   foreign shared Arena. The JVM deoptimizes the top few frames of all
   threads when closing a shared Arena (see JDK-8335480). We mitigate
   this situation when running with JDK 21 and greater, by *1) using a
   confined Arena where appropriate, and 2) grouping files from the
   same segment to a single shared Arena*. A system property has been
   added that allows to control the total maximum number of mmapped
   files that may be associated with a single shared Arena. For
   example, to set the max number of permits to 256, pass the following
   on the command line
   -Dorg.apache.lucene.store.MMapDirectory.sharedArenaMaxPermits=256.
   Setting a value of 1 associatesa single file to a single shared arena.
   (Chris Hegarty, Michael Gibney, Uwe Schindler)

Actually it looks like there are many deletes on the same index segment so the segment itsself is not closed but the deltes are updated over an over. As the whole segment uses the same shared memory arena and it won't delete all 1024 (the default value) mappings and this would count against the maxMapCount limit.

To work around the issue you can choose to reduce the setting as described above by passing it as a separate system property on Opensearch's command line. I'd recomment to use a smaller value like 64 for systems with many indexes.

Please tell us what you found out! Did reducing the sharedArenaMaxPermits limit help? Maybe a good idea would be to change Lucene / Opensearch to open deletion files in a separate arena or use READONCE to load them to memory.

Uwe

Am 07.05.2025 um 03:44 schrieb Justin Borromeo:
Hi all,

After upgrading our OpenSearch cluster from 2.16.0 to 2.19.1 (moving from
Lucene 9.10 to Lucene 9.12), our largest clusters started crashing with the
following error:

# There is insufficient memory for the Java Runtime Environment to continue.

# Native memory allocation (malloc) failed to allocate 2097152 bytes. Error
detail: AllocateHeap

We narrowed down the issue to the vm max map count (262144) being reached.
Prior to server crash, we see map count (measured by `cat /proc/{pid}/maps
| wc -l`) approach the 262144 limit we set.  Looking at one of the outputs
of `cat /proc/{pid}/maps`, we observed that 246K of the 252K maps are for
deleted doc values (.dvd) files.

Is this expected?  If so, were there any changes in the Lucene codebase
between those two versions that could have caused this?  Any suggestions on
debugging?

Thanks in advance and sorry if this is a better question for the OS
community or the Lucene developer list.

Justin Borromeo

--
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail:u...@thetaphi.de

Reply via email to