Hi Uwe,

Setting -Dorg.apache.lucene.store.MMapDirectory.sharedArenaMaxPermits=64
didn't seem to help and we're still seeing restarts.  A question about your
response: what are normal update ratios?

Each of our machines is running 32 OpenSearch shards (Lucene indexes), each
with about 52 segments.  I could see how we're running into the 262144
max_map_count limit since 1024*32*52=1.7M.  If this were the cause though,
we'd expect the max permits change to help since 64 * 32 * 52 = 106K?

Is the issue here that we're mapping too many files or that these files'
memory mappings aren't being "released" after being deleted?

Justin



On Fri, May 9, 2025 at 7:03 AM Uwe Schindler <u...@thetaphi.de> wrote:

> Hi,
>
> Did the sharedArenaMaxPermits=64 help.
>
> Actually sorry for the answer, I did not recognize that you were talking
> about doc values updates. I just saw deleted. But basically the issue is
> the same: Every update or delete will create a new file belonging to
> same segment. As each segment by default can have 1024 mappings this can
> sum up quite fast to the max mmap count. A typical index has 20
> segments, so this could sum up to 20.000 mappings per index.
>
> I don't remember why Chris set the limit to 1024. In most cases segments
> will only have a dozen files maximum and if you do normal update ratios
> the number of open mappings should be fine to keep limited to 64 or even
> lower.
>
> If the change helps for you we can open an issue to adapt the defaults
> for the shared arenas. BTW, if you want to go back to the default of
> previous lucene versions use 1, but this could have a degradion of
> performance when you have many updates.
>
> Uwe
>
> Am 07.05.2025 um 21:48 schrieb Justin Borromeo:
> > Hi Uwe,
> >
> > Thanks for the response.  We've tried setting sharedArenaMaxPermits to
> 64;
> > I'll update this thread once we get some data.
> >
> > One thing I don't understand is why does the list of deleted mmapped
> > fields only include doc values files?  If your theory is correct and this
> > is caused by deletes being updated over and over, wouldn't we expect only
> > .liv files to be deleted?
> >
> > Output:
> > ```
> > 7ed8a2754000-7ed8a2757000 r--s 00000000 08:10 80872480
> >
> /usr/share/opensearch/data/nodes/0/indices/Ci3MyIbNTceUmC67d1IlwQ/196/index/_9h0q_de_Lucene90_0.dvd
> > (deleted)
> > 7ed8a2757000-7ed8a275c000 r--s 00000000 08:10 78838113
> >
> /usr/share/opensearch/data/nodes/0/indices/Ci3MyIbNTceUmC67d1IlwQ/119/index/_912j_m9_Lucene90_0.dvd
> > (deleted)
> > 7ed8a275c000-7ed8a275f000 r--s 00000000 08:10 78830146
> >
> /usr/share/opensearch/data/nodes/0/indices/Ci3MyIbNTceUmC67d1IlwQ/126/index/_9buk_4e_Lucene90_0.dvd
> > (deleted)
> > ```
> >
> > Justin
> >
> > On Wed, May 7, 2025 at 9:50 AM Uwe Schindler <u...@thetaphi.de> wrote:
> >
> >> Hi,
> >>
> >> this could be related to a bug or limitation of the following change:
> >>
> >>   1. GITHUB#13570
> >>      <https://github.com/apache/lucene/pull/13570>,GITHUB#13574
> >>      <https://github.com/apache/lucene/pull/13574>,GITHUB#13535
> >>      <https://github.com/apache/lucene/pull/13535>: Avoid performance
> >>      degradation with closing shared Arenas. Closing many individual
> >>      index files can potentially lead to a degradation in execution
> >>      performance. Index files are mmapped one-to-one with the JDK's
> >>      foreign shared Arena. The JVM deoptimizes the top few frames of all
> >>      threads when closing a shared Arena (see JDK-8335480). We mitigate
> >>      this situation when running with JDK 21 and greater, by *1) using a
> >>      confined Arena where appropriate, and 2) grouping files from the
> >>      same segment to a single shared Arena*. A system property has been
> >>      added that allows to control the total maximum number of mmapped
> >>      files that may be associated with a single shared Arena. For
> >>      example, to set the max number of permits to 256, pass the
> following
> >>      on the command line
> >>      -Dorg.apache.lucene.store.MMapDirectory.sharedArenaMaxPermits=256.
> >>      Setting a value of 1 associatesa single file to a single shared
> arena.
> >>      (Chris Hegarty, Michael Gibney, Uwe Schindler)
> >>
> >> Actually it looks like there are many deletes on the same index segment
> >> so the segment itsself is not closed but the deltes are updated over an
> >> over. As the whole segment uses the same shared memory arena and it
> >> won't delete all 1024 (the default value) mappings and this would count
> >> against the maxMapCount limit.
> >>
> >> To work around the issue you can choose to reduce the setting as
> >> described above by passing it as a separate system property on
> >> Opensearch's command line. I'd recomment to use a smaller value like 64
> >> for systems with many indexes.
> >>
> >> Please tell us what you found out! Did reducing the
> >> sharedArenaMaxPermits limit help? Maybe a good idea would be to change
> >> Lucene / Opensearch to open deletion files in a separate arena or use
> >> READONCE to load them to memory.
> >>
> >> Uwe
> >>
> >> Am 07.05.2025 um 03:44 schrieb Justin Borromeo:
> >>> Hi all,
> >>>
> >>> After upgrading our OpenSearch cluster from 2.16.0 to 2.19.1 (moving
> from
> >>> Lucene 9.10 to Lucene 9.12), our largest clusters started crashing with
> >> the
> >>> following error:
> >>>
> >>> # There is insufficient memory for the Java Runtime Environment to
> >> continue.
> >>> # Native memory allocation (malloc) failed to allocate 2097152 bytes.
> >> Error
> >>> detail: AllocateHeap
> >>>
> >>> We narrowed down the issue to the vm max map count (262144) being
> >> reached.
> >>> Prior to server crash, we see map count (measured by `cat
> >> /proc/{pid}/maps
> >>> | wc -l`) approach the 262144 limit we set.  Looking at one of the
> >> outputs
> >>> of `cat /proc/{pid}/maps`, we observed that 246K of the 252K maps are
> for
> >>> deleted doc values (.dvd) files.
> >>>
> >>> Is this expected?  If so, were there any changes in the Lucene codebase
> >>> between those two versions that could have caused this?  Any
> suggestions
> >> on
> >>> debugging?
> >>>
> >>> Thanks in advance and sorry if this is a better question for the OS
> >>> community or the Lucene developer list.
> >>>
> >>> Justin Borromeo
> >>>
> >> --
> >> Uwe Schindler
> >> Achterdiek 19, D-28357 Bremen
> >> https://www.thetaphi.de
> >> eMail:u...@thetaphi.de
> >>
> --
> Uwe Schindler
> Achterdiek 19, D-28357 Bremen
> https://www.thetaphi.de
> eMail: u...@thetaphi.de
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

Reply via email to