[ https://issues.apache.org/jira/browse/KAFKA-19390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Masahiro Mori reassigned KAFKA-19390: ------------------------------------- Assignee: Masahiro Mori > AbstractIndex#resize() does not release old mmap on Linux > --------------------------------------------------------- > > Key: KAFKA-19390 > URL: https://issues.apache.org/jira/browse/KAFKA-19390 > Project: Kafka > Issue Type: Bug > Components: core > Affects Versions: 3.8.1 > Reporter: Masahiro Mori > Assignee: Masahiro Mori > Priority: Major > Labels: Linux > > Our kafka broker crashed with the following error: > {code:java} > [2025-03-29 09:37:03,218] ERROR Error while appending records to > <topic>-<partition> in dir /kafka-logs/data ... > java.io.IOException: Map failed > at java.base/sun.nio.ch.FileChannelImpl.mapInternal(FileChannelImpl.java:1127) > at java.base/sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:1032) > at > org.apache.kafka.storage.internals.log.AbstractIndex.createMappedBuffer(AbstractIndex.java:467) > at > org.apache.kafka.storage.internals.log.AbstractIndex.createAndAssignMmap(AbstractIndex.java:105) > at > org.apache.kafka.storage.internals.log.AbstractIndex.<init>(AbstractIndex.java:83) > at org.apache.kafka.storage.internals.log.TimeIndex.<init>(TimeIndex.java:65) > at > org.apache.kafka.storage.internals.log.LazyIndex.loadIndex(LazyIndex.java:242) > at org.apache.kafka.storage.internals.log.LazyIndex.get(LazyIndex.java:179) > at > org.apache.kafka.storage.internals.log.LogSegment.timeIndex(LogSegment.java:146) > at > org.apache.kafka.storage.internals.log.LogSegment.readMaxTimestampAndOffsetSoFar(LogSegment.java:201) > at > org.apache.kafka.storage.internals.log.LogSegment.maxTimestampSoFar(LogSegment.java:211) > at > org.apache.kafka.storage.internals.log.LogSegment.append(LogSegment.java:262) > at kafka.log.LocalLog.append(LocalLog.scala:417) > ... > Caused by: java.lang.OutOfMemoryError: Map failed > at java.base/sun.nio.ch.FileChannelImpl.map0(Native Method) > at java.base/sun.nio.ch.FileChannelImpl.mapInternal(FileChannelImpl.java:1124) > ... 33 more{code} > We found that kafka process hit the vm.max_map_count limit (which was set to > 262144) and most of the mapped entries correspond to deleted index files. > {code:java} > > sudo cat /proc/${KAFKA_PID}/maps | grep deleted > 7d8c5cc00000-7d8c5d600000 rw-s 00000000 08:11 202854769 > /kafka-logs/data/topic1-22/00000000332910579773.timeindex.deleted (deleted) > 7d8c5d800000-7d8c5e200000 rw-s 00000000 08:11 202854768 > /kafka-logs/data/topic1-22/00000000332910579773.index.deleted (deleted) > 7d8c67400000-7d8c67e00000 rw-s 00000000 08:11 202562514 > /kafka-logs/data/topic2-116/00000000165968090794.timeindex.deleted (deleted) > 7d8c68000000-7d8c68a00000 rw-s 00000000 08:11 202562513 > /kafka-logs/data/topic2-116/00000000165968090794.index.deleted (deleted) > 7d8c6d400000-7d8c6de00000 rw-s 00000000 08:11 202596518 > /kafka-logs/data/topic2-356/00000000168702579081.timeindex.deleted (deleted) > 7d8c6e000000-7d8c6ea00000 rw-s 00000000 08:11 202596517 > /kafka-logs/data/topic2-356/00000000168702579081.index.deleted (deleted) > 7d8c71c00000-7d8c72600000 rw-s 00000000 08:11 202798981 > /kafka-logs/data/topic3-433/00000000116740630582.timeindex.deleted (deleted) > 7d8c72800000-7d8c73200000 rw-s 00000000 08:11 202798980 > /kafka-logs/data/topic3-433/00000000116740630582.index.deleted (deleted) > 7d8c77c00000-7d8c78600000 rw-s 00000000 08:11 202754947 > /kafka-logs/data/topic3-74/00000000118067749684.timeindex.deleted (deleted) > 7d8c78800000-7d8c79200000 rw-s 00000000 08:11 202754946 > /kafka-logs/data/topic3-74/00000000118067749684.index.deleted (deleted) > 7d8c79400000-7d8c79e00000 rw-s 00000000 08:11 202813710 > /kafka-logs/data/topic2-82/00000000162756700035.timeindex.deleted (deleted) > 7d8c7a000000-7d8c7aa00000 rw-s 00000000 08:11 202813709 > /kafka-logs/data/topic2-82/00000000162756700035.index.deleted (deleted) > 7d8c7ac00000-7d8c7b600000 rw-s 00000000 08:11 202596526 > /kafka-logs/data/topic2-355/00000000169939763750.timeindex.deleted (deleted) > 7d8c7b800000-7d8c7c200000 rw-s 00000000 08:11 202596525 > /kafka-logs/data/topic2-355/00000000169939763750.index.deleted (deleted) > 7d8c7c400000-7d8c7ce00000 rw-s 00000000 08:11 202562498 > /kafka-logs/data/topic2-295/00000000168913981903.timeindex.deleted (deleted) > 7d8c7d000000-7d8c7da00000 rw-s 00000000 08:11 202562497 > /kafka-logs/data/topic2-295/00000000168913981903.index.deleted (deleted) > 7d8c80c00000-7d8c81600000 rw-s 00000000 08:11 202754939 > /kafka-logs/data/topic3-13/00000000115588098896.timeindex.deleted (deleted) > 7d8c81800000-7d8c82200000 rw-s 00000000 08:11 202754938 > /kafka-logs/data/topic3-13/00000000115588098896.index.deleted (deleted) > 7d8c83c00000-7d8c84600000 rw-s 00000000 08:11 202798989 > /kafka-logs/data/topic3-314/00000000118254254601.timeindex.deleted (deleted) > 7d8c84800000-7d8c85200000 rw-s 00000000 08:11 202798988 > /kafka-logs/data/topic3-314/00000000118254254601.index.deleted (deleted) > ...{code} > In AbstractIndex.resize(), the old memory mapping is explicitly unmapped on > windows or z/OS using safeForceUnmap(), but on Linux the unmapping step is > skipped. > The same issue was originally reported in KAFKA-7442, but the corresponding > pull request was never merged. > We propose that resize() should call safeForceUnmap() on all platforms to > prevent stale mappings from lingering. -- This message was sent by Atlassian Jira (v8.20.10#820010)