Hi,
this could be related to a bug or limitation of the following change:
1. GITHUB#13570
<https://github.com/apache/lucene/pull/13570>,GITHUB#13574
<https://github.com/apache/lucene/pull/13574>,GITHUB#13535
<https://github.com/apache/lucene/pull/13535>: Avoid performance
degradation with closing shared Arenas. Closing many individual
index files can potentially lead to a degradation in execution
performance. Index files are mmapped one-to-one with the JDK's
foreign shared Arena. The JVM deoptimizes the top few frames of all
threads when closing a shared Arena (see JDK-8335480). We mitigate
this situation when running with JDK 21 and greater, by *1) using a
confined Arena where appropriate, and 2) grouping files from the
same segment to a single shared Arena*. A system property has been
added that allows to control the total maximum number of mmapped
files that may be associated with a single shared Arena. For
example, to set the max number of permits to 256, pass the following
on the command line
-Dorg.apache.lucene.store.MMapDirectory.sharedArenaMaxPermits=256.
Setting a value of 1 associatesa single file to a single shared arena.
(Chris Hegarty, Michael Gibney, Uwe Schindler)
Actually it looks like there are many deletes on the same index segment
so the segment itsself is not closed but the deltes are updated over an
over. As the whole segment uses the same shared memory arena and it
won't delete all 1024 (the default value) mappings and this would count
against the maxMapCount limit.
To work around the issue you can choose to reduce the setting as
described above by passing it as a separate system property on
Opensearch's command line. I'd recomment to use a smaller value like 64
for systems with many indexes.
Please tell us what you found out! Did reducing the
sharedArenaMaxPermits limit help? Maybe a good idea would be to change
Lucene / Opensearch to open deletion files in a separate arena or use
READONCE to load them to memory.
Uwe
Am 07.05.2025 um 03:44 schrieb Justin Borromeo:
Hi all,
After upgrading our OpenSearch cluster from 2.16.0 to 2.19.1 (moving from
Lucene 9.10 to Lucene 9.12), our largest clusters started crashing with the
following error:
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (malloc) failed to allocate 2097152 bytes. Error
detail: AllocateHeap
We narrowed down the issue to the vm max map count (262144) being reached.
Prior to server crash, we see map count (measured by `cat /proc/{pid}/maps
| wc -l`) approach the 262144 limit we set. Looking at one of the outputs
of `cat /proc/{pid}/maps`, we observed that 246K of the 252K maps are for
deleted doc values (.dvd) files.
Is this expected? If so, were there any changes in the Lucene codebase
between those two versions that could have caused this? Any suggestions on
debugging?
Thanks in advance and sorry if this is a better question for the OS
community or the Lucene developer list.
Justin Borromeo
--
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail:u...@thetaphi.de