Re: Crashes caused by high deleted .dvd file mmap counts

Uwe Schindler Tue, 13 May 2025 04:58:03 -0700

Hi,

you can try to further reduce the sharedArenaMaxPermits down to 1 (whichrestores the old behaviour). I don't know what Opensearch is doing withthose many docvalues updates, but you're already very close at limits.If you need so many indexes in one machine and if you don't care of"slower" updates, you can for sure restore the previous behaviour.

The problem with the limits above are that Lucene also reopens andmerges segment files, those actions and especially when IndexWriter hasclosed and deleted a segment, there can still be references open tofiles which makes the mappings survive longer than wanted.

Of course there could be a bug in OpenSearch that it does not close afile correctly. They have a lot of customized codec code, maybe theres abug in it. With plain Lucene it should not run out of file handles ormappings. We did not hear any complaints by Elasticsearch people whichleads me to the fact that this might be specific to OpenSearch.

So my only suggestion would be to reduce the limit to 1. This restoresprevious behaviour. If you then still get problems then theres a "filenot closed" bug somewhere in custom Lucene code of Opensearch. Maybe therecycling on IndexSearchers has problems. It is hard to figure out. Ofcourse I would start seraching at the places where the dvd files areregenerated. Maybe that helps nailing down the issue.


Uwe

Am 13.05.2025 um 04:12 schrieb Justin Borromeo:

Hi Uwe,

Setting -Dorg.apache.lucene.store.MMapDirectory.sharedArenaMaxPermits=64
didn't seem to help and we're still seeing restarts.  A question about your
response: what are normal update ratios?

Each of our machines is running 32 OpenSearch shards (Lucene indexes), each
with about 52 segments.  I could see how we're running into the 262144
max_map_count limit since 1024*32*52=1.7M.  If this were the cause though,
we'd expect the max permits change to help since 64 * 32 * 52 = 106K?

Is the issue here that we're mapping too many files or that these files'
memory mappings aren't being "released" after being deleted?

Justin



On Fri, May 9, 2025 at 7:03 AM Uwe Schindler <u...@thetaphi.de> wrote:

Hi,

Did the sharedArenaMaxPermits=64 help.

Actually sorry for the answer, I did not recognize that you were talking
about doc values updates. I just saw deleted. But basically the issue is
the same: Every update or delete will create a new file belonging to
same segment. As each segment by default can have 1024 mappings this can
sum up quite fast to the max mmap count. A typical index has 20
segments, so this could sum up to 20.000 mappings per index.

I don't remember why Chris set the limit to 1024. In most cases segments
will only have a dozen files maximum and if you do normal update ratios
the number of open mappings should be fine to keep limited to 64 or even
lower.

If the change helps for you we can open an issue to adapt the defaults
for the shared arenas. BTW, if you want to go back to the default of
previous lucene versions use 1, but this could have a degradion of
performance when you have many updates.

Uwe

Am 07.05.2025 um 21:48 schrieb Justin Borromeo:

Hi Uwe,

Thanks for the response.  We've tried setting sharedArenaMaxPermits to

64;

I'll update this thread once we get some data.

One thing I don't understand is why does the list of deleted mmapped
fields only include doc values files?  If your theory is correct and this
is caused by deletes being updated over and over, wouldn't we expect only
.liv files to be deleted?

Output:
```
7ed8a2754000-7ed8a2757000 r--s 00000000 08:10 80872480

/usr/share/opensearch/data/nodes/0/indices/Ci3MyIbNTceUmC67d1IlwQ/196/index/_9h0q_de_Lucene90_0.dvd

(deleted)
7ed8a2757000-7ed8a275c000 r--s 00000000 08:10 78838113

/usr/share/opensearch/data/nodes/0/indices/Ci3MyIbNTceUmC67d1IlwQ/119/index/_912j_m9_Lucene90_0.dvd

(deleted)
7ed8a275c000-7ed8a275f000 r--s 00000000 08:10 78830146

/usr/share/opensearch/data/nodes/0/indices/Ci3MyIbNTceUmC67d1IlwQ/126/index/_9buk_4e_Lucene90_0.dvd

(deleted)
```

Justin

On Wed, May 7, 2025 at 9:50 AM Uwe Schindler <u...@thetaphi.de> wrote:

Hi,

this could be related to a bug or limitation of the following change:

   1. GITHUB#13570
      <https://github.com/apache/lucene/pull/13570>,GITHUB#13574
      <https://github.com/apache/lucene/pull/13574>,GITHUB#13535
      <https://github.com/apache/lucene/pull/13535>: Avoid performance
      degradation with closing shared Arenas. Closing many individual
      index files can potentially lead to a degradation in execution
      performance. Index files are mmapped one-to-one with the JDK's
      foreign shared Arena. The JVM deoptimizes the top few frames of all
      threads when closing a shared Arena (see JDK-8335480). We mitigate
      this situation when running with JDK 21 and greater, by *1) using a
      confined Arena where appropriate, and 2) grouping files from the
      same segment to a single shared Arena*. A system property has been
      added that allows to control the total maximum number of mmapped
      files that may be associated with a single shared Arena. For
      example, to set the max number of permits to 256, pass the

following

      on the command line
      -Dorg.apache.lucene.store.MMapDirectory.sharedArenaMaxPermits=256.
      Setting a value of 1 associatesa single file to a single shared

arena.

      (Chris Hegarty, Michael Gibney, Uwe Schindler)

Actually it looks like there are many deletes on the same index segment
so the segment itsself is not closed but the deltes are updated over an
over. As the whole segment uses the same shared memory arena and it
won't delete all 1024 (the default value) mappings and this would count
against the maxMapCount limit.

To work around the issue you can choose to reduce the setting as
described above by passing it as a separate system property on
Opensearch's command line. I'd recomment to use a smaller value like 64
for systems with many indexes.

Please tell us what you found out! Did reducing the
sharedArenaMaxPermits limit help? Maybe a good idea would be to change
Lucene / Opensearch to open deletion files in a separate arena or use
READONCE to load them to memory.

Uwe

Am 07.05.2025 um 03:44 schrieb Justin Borromeo:

Hi all,

After upgrading our OpenSearch cluster from 2.16.0 to 2.19.1 (moving

from

Lucene 9.10 to Lucene 9.12), our largest clusters started crashing with

the

following error:

# There is insufficient memory for the Java Runtime Environment to

continue.

# Native memory allocation (malloc) failed to allocate 2097152 bytes.

Error

detail: AllocateHeap

We narrowed down the issue to the vm max map count (262144) being

reached.

Prior to server crash, we see map count (measured by `cat

/proc/{pid}/maps

| wc -l`) approach the 262144 limit we set.  Looking at one of the

outputs

of `cat /proc/{pid}/maps`, we observed that 246K of the 252K maps are

for

deleted doc values (.dvd) files.

Is this expected?  If so, were there any changes in the Lucene codebase
between those two versions that could have caused this?  Any

suggestions

on

debugging?

Thanks in advance and sorry if this is a better question for the OS
community or the Lucene developer list.

Justin Borromeo

--
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail:u...@thetaphi.de

--
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail: u...@thetaphi.de


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

--
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail: u...@thetaphi.de


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Crashes caused by high deleted .dvd file mmap counts

Reply via email to