[
https://issues.apache.org/jira/browse/IGNITE-8790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alexey Goncharuk resolved IGNITE-8790.
--------------------------------------
Resolution: Duplicate
> JVM crash during memory recovery
> --------------------------------
>
> Key: IGNITE-8790
> URL: https://issues.apache.org/jira/browse/IGNITE-8790
> Project: Ignite
> Issue Type: Bug
> Reporter: Alexey Goncharuk
> Assignee: Alexey Goncharuk
> Priority: Major
>
> I've observed the following JVM crash after one of the Ignite node restarts
> on 2.5 (only relevant part is kept):
> {code}
> Stack: [0x00007f16f40b8000,0x00007f16f41b9000], sp=0x00007f16f41b7308, free
> space=1020k
> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native
> code)
> V [libjvm.so+0x803675]
> J 868 sun.misc.Unsafe.copyMemory(Ljava/lang/Object;JLjava/lang/Object;JJ)V
> (0 bytes) @ 0x00007f173d351ca1 [0x00007f173d351bc0+0xe1]
> J 3023 C1
> org.apache.ignite.internal.util.GridUnsafe.copyMemory(Ljava/lang/Object;JLjava/lang/Object;JJ)V
> (77 bytes) @ 0x00007f173d9e8d64 [0x00007f173d9e8ae0+0x284]
> J 2991 C1 org.apache.ignite.internal.pagemem.PageUtils.putBytes(JI[B)V (73
> bytes) @ 0x00007f173d9e1dbc [0x00007f173d9e1d00+0xbc]
> j
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restoreMemory(Lorg/apache/ignite/internal/processors/cache/persistence/GridCacheDatabaseSharedManager$CheckpointStatus;ZLorg/apache/ignite/internal/processors/cache/persistence/pagemem/PageMemoryEx;)Lorg/apache/ignite/internal/pagemem/wal/WALPointer;+568
> j
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restoreMemory(Lorg/apache/ignite/internal/processors/cache/persistence/GridCacheDatabaseSharedManager$CheckpointStatus;)Lorg/apache/ignite/internal/pagemem/wal/WALPointer;+13
> j
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readCheckpointAndRestoreMemory(Ljava/util/List;)V+173
> j
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onClusterStateChangeRequest(Z)Lorg/apache/ignite/internal/processors/cache/distributed/dht/preloader/GridDhtPartitionsExchangeFuture$ExchangeType;+311
> j
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(Z)V+574
> j
> org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0()V+547
> j
> org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body()V+3
> j org.apache.ignite.internal.util.worker.GridWorker.run()V+82
> j java.lang.Thread.run()V+11
> v ~StubRoutines::call_stub
> V [libjvm.so+0x695b96]
> V [libjvm.so+0x6960a1]
> V [libjvm.so+0x696537]
> V [libjvm.so+0x71596e]
> V [libjvm.so+0xa7f243]
> V [libjvm.so+0xa7f38c]
> V [libjvm.so+0x92e0f8]
> C [libpthread.so.0+0x76ba] start_thread+0xca
> {code}
> Looks like that the issue is caused by a page which ID was rotated and the
> node failed before checkpoint is finished. Then, on the second node restart,
> the page was written to the disk, but node was stopped again before the
> checkpoint marker was written.
> Then, on second node restart we attempt to write-lock the page, but lock
> fails because the page tag logged to WAL is different then the one written in
> the store.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)