[ 
https://issues.apache.org/jira/browse/IGNITE-8790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Goncharuk resolved IGNITE-8790.
--------------------------------------
    Resolution: Duplicate

> JVM crash during memory recovery
> --------------------------------
>
>                 Key: IGNITE-8790
>                 URL: https://issues.apache.org/jira/browse/IGNITE-8790
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Alexey Goncharuk
>            Assignee: Alexey Goncharuk
>            Priority: Major
>
> I've observed the following JVM crash after one of the Ignite node restarts 
> on 2.5 (only relevant part is kept):
> {code}
> Stack: [0x00007f16f40b8000,0x00007f16f41b9000],  sp=0x00007f16f41b7308,  free 
> space=1020k
> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native 
> code)
> V  [libjvm.so+0x803675]
> J 868  sun.misc.Unsafe.copyMemory(Ljava/lang/Object;JLjava/lang/Object;JJ)V 
> (0 bytes) @ 0x00007f173d351ca1 [0x00007f173d351bc0+0xe1]
> J 3023 C1 
> org.apache.ignite.internal.util.GridUnsafe.copyMemory(Ljava/lang/Object;JLjava/lang/Object;JJ)V
>  (77 bytes) @ 0x00007f173d9e8d64 [0x00007f173d9e8ae0+0x284]
> J 2991 C1 org.apache.ignite.internal.pagemem.PageUtils.putBytes(JI[B)V (73 
> bytes) @ 0x00007f173d9e1dbc [0x00007f173d9e1d00+0xbc]
> j  
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restoreMemory(Lorg/apache/ignite/internal/processors/cache/persistence/GridCacheDatabaseSharedManager$CheckpointStatus;ZLorg/apache/ignite/internal/processors/cache/persistence/pagemem/PageMemoryEx;)Lorg/apache/ignite/internal/pagemem/wal/WALPointer;+568
> j  
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restoreMemory(Lorg/apache/ignite/internal/processors/cache/persistence/GridCacheDatabaseSharedManager$CheckpointStatus;)Lorg/apache/ignite/internal/pagemem/wal/WALPointer;+13
> j  
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readCheckpointAndRestoreMemory(Ljava/util/List;)V+173
> j  
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onClusterStateChangeRequest(Z)Lorg/apache/ignite/internal/processors/cache/distributed/dht/preloader/GridDhtPartitionsExchangeFuture$ExchangeType;+311
> j  
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(Z)V+574
> j  
> org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0()V+547
> j  
> org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body()V+3
> j  org.apache.ignite.internal.util.worker.GridWorker.run()V+82
> j  java.lang.Thread.run()V+11
> v  ~StubRoutines::call_stub
> V  [libjvm.so+0x695b96]
> V  [libjvm.so+0x6960a1]
> V  [libjvm.so+0x696537]
> V  [libjvm.so+0x71596e]
> V  [libjvm.so+0xa7f243]
> V  [libjvm.so+0xa7f38c]
> V  [libjvm.so+0x92e0f8]
> C  [libpthread.so.0+0x76ba]  start_thread+0xca
> {code}
> Looks like that the issue is caused by a page which ID was rotated and the 
> node failed before checkpoint is finished. Then, on the second node restart, 
> the page was written to the disk, but node was stopped again before the 
> checkpoint marker was written.
> Then, on second node restart we attempt to write-lock the page, but lock 
> fails because the page tag logged to WAL is different then the one written in 
> the store.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to