Hi,

you can try to further reduce the sharedArenaMaxPermits down to 1 (which restores the old behaviour). I don't know what Opensearch is doing with those many docvalues updates, but you're already very close at limits. If you need so many indexes in one machine and if you don't care of "slower" updates, you can for sure restore the previous behaviour.

The problem with the limits above are that Lucene also reopens and merges segment files, those actions and especially when IndexWriter has closed and deleted a segment, there can still be references open to files which makes the mappings survive longer than wanted.

Of course there could be a bug in OpenSearch that it does not close a file correctly. They have a lot of customized codec code, maybe theres a bug in it. With plain Lucene it should not run out of file handles or mappings. We did not hear any complaints by Elasticsearch people which leads me to the fact that this might be specific to OpenSearch.

So my only suggestion would be to reduce the limit to 1. This restores previous behaviour. If you then still get problems then theres a "file not closed" bug somewhere in custom Lucene code of Opensearch. Maybe the recycling on IndexSearchers has problems. It is hard to figure out. Of course I would start seraching at the places where the dvd files are regenerated. Maybe that helps nailing down the issue.

Uwe

Am 13.05.2025 um 04:12 schrieb Justin Borromeo:
Hi Uwe,

Setting -Dorg.apache.lucene.store.MMapDirectory.sharedArenaMaxPermits=64
didn't seem to help and we're still seeing restarts.  A question about your
response: what are normal update ratios?

Each of our machines is running 32 OpenSearch shards (Lucene indexes), each
with about 52 segments.  I could see how we're running into the 262144
max_map_count limit since 1024*32*52=1.7M.  If this were the cause though,
we'd expect the max permits change to help since 64 * 32 * 52 = 106K?

Is the issue here that we're mapping too many files or that these files'
memory mappings aren't being "released" after being deleted?

Justin



On Fri, May 9, 2025 at 7:03 AM Uwe Schindler <u...@thetaphi.de> wrote:

Hi,

Did the sharedArenaMaxPermits=64 help.

Actually sorry for the answer, I did not recognize that you were talking
about doc values updates. I just saw deleted. But basically the issue is
the same: Every update or delete will create a new file belonging to
same segment. As each segment by default can have 1024 mappings this can
sum up quite fast to the max mmap count. A typical index has 20
segments, so this could sum up to 20.000 mappings per index.

I don't remember why Chris set the limit to 1024. In most cases segments
will only have a dozen files maximum and if you do normal update ratios
the number of open mappings should be fine to keep limited to 64 or even
lower.

If the change helps for you we can open an issue to adapt the defaults
for the shared arenas. BTW, if you want to go back to the default of
previous lucene versions use 1, but this could have a degradion of
performance when you have many updates.

Uwe

Am 07.05.2025 um 21:48 schrieb Justin Borromeo:
Hi Uwe,

Thanks for the response.  We've tried setting sharedArenaMaxPermits to
64;
I'll update this thread once we get some data.

One thing I don't understand is why does the list of deleted mmapped
fields only include doc values files?  If your theory is correct and this
is caused by deletes being updated over and over, wouldn't we expect only
.liv files to be deleted?

Output:
```
7ed8a2754000-7ed8a2757000 r--s 00000000 08:10 80872480

/usr/share/opensearch/data/nodes/0/indices/Ci3MyIbNTceUmC67d1IlwQ/196/index/_9h0q_de_Lucene90_0.dvd
(deleted)
7ed8a2757000-7ed8a275c000 r--s 00000000 08:10 78838113

/usr/share/opensearch/data/nodes/0/indices/Ci3MyIbNTceUmC67d1IlwQ/119/index/_912j_m9_Lucene90_0.dvd
(deleted)
7ed8a275c000-7ed8a275f000 r--s 00000000 08:10 78830146

/usr/share/opensearch/data/nodes/0/indices/Ci3MyIbNTceUmC67d1IlwQ/126/index/_9buk_4e_Lucene90_0.dvd
(deleted)
```

Justin

On Wed, May 7, 2025 at 9:50 AM Uwe Schindler <u...@thetaphi.de> wrote:

Hi,

this could be related to a bug or limitation of the following change:

   1. GITHUB#13570
      <https://github.com/apache/lucene/pull/13570>,GITHUB#13574
      <https://github.com/apache/lucene/pull/13574>,GITHUB#13535
      <https://github.com/apache/lucene/pull/13535>: Avoid performance
      degradation with closing shared Arenas. Closing many individual
      index files can potentially lead to a degradation in execution
      performance. Index files are mmapped one-to-one with the JDK's
      foreign shared Arena. The JVM deoptimizes the top few frames of all
      threads when closing a shared Arena (see JDK-8335480). We mitigate
      this situation when running with JDK 21 and greater, by *1) using a
      confined Arena where appropriate, and 2) grouping files from the
      same segment to a single shared Arena*. A system property has been
      added that allows to control the total maximum number of mmapped
      files that may be associated with a single shared Arena. For
      example, to set the max number of permits to 256, pass the
following
      on the command line
      -Dorg.apache.lucene.store.MMapDirectory.sharedArenaMaxPermits=256.
      Setting a value of 1 associatesa single file to a single shared
arena.
      (Chris Hegarty, Michael Gibney, Uwe Schindler)

Actually it looks like there are many deletes on the same index segment
so the segment itsself is not closed but the deltes are updated over an
over. As the whole segment uses the same shared memory arena and it
won't delete all 1024 (the default value) mappings and this would count
against the maxMapCount limit.

To work around the issue you can choose to reduce the setting as
described above by passing it as a separate system property on
Opensearch's command line. I'd recomment to use a smaller value like 64
for systems with many indexes.

Please tell us what you found out! Did reducing the
sharedArenaMaxPermits limit help? Maybe a good idea would be to change
Lucene / Opensearch to open deletion files in a separate arena or use
READONCE to load them to memory.

Uwe

Am 07.05.2025 um 03:44 schrieb Justin Borromeo:
Hi all,

After upgrading our OpenSearch cluster from 2.16.0 to 2.19.1 (moving
from
Lucene 9.10 to Lucene 9.12), our largest clusters started crashing with
the
following error:

# There is insufficient memory for the Java Runtime Environment to
continue.
# Native memory allocation (malloc) failed to allocate 2097152 bytes.
Error
detail: AllocateHeap

We narrowed down the issue to the vm max map count (262144) being
reached.
Prior to server crash, we see map count (measured by `cat
/proc/{pid}/maps
| wc -l`) approach the 262144 limit we set.  Looking at one of the
outputs
of `cat /proc/{pid}/maps`, we observed that 246K of the 252K maps are
for
deleted doc values (.dvd) files.

Is this expected?  If so, were there any changes in the Lucene codebase
between those two versions that could have caused this?  Any
suggestions
on
debugging?

Thanks in advance and sorry if this is a better question for the OS
community or the Lucene developer list.

Justin Borromeo

--
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail:u...@thetaphi.de

--
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail: u...@thetaphi.de


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


--
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail: u...@thetaphi.de


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to