[ 
https://issues.apache.org/jira/browse/IGNITE-7731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ksenia Rybakova updated IGNITE-7731:
------------------------------------
    Description: 
During failover test restarted node fails to start with the following exception:
{noformat}
[2018-02-15 12:17:46,388][INFO 
][exchange-worker-#62][GridCacheDatabaseSharedManager] Read checkpoint status 
[startMarker=/storage/ssd/krybakova/20181502-120637-2.3.0-SNAPSHOT-failover-756ae8d4-c12-s24-p200000-r400000-b2-d7200/yardstick/work/db/node00-730721d0-e532-4f3a-b9e9-29277c0b7a9a/cp/1518685946892-39fa4858-66cb-4c88-9a1c-13a8625e1158-START.bin,
 endMarker=null]
[2018-02-15 12:17:46,389][INFO 
][exchange-worker-#62][GridCacheDatabaseSharedManager] Checking memory state 
[lastValidPos=FileWALPointer [idx=0, fileOffset=0, len=0, forceFlush=false], 
lastMarked=FileWALPointer [idx=1, fileOffset=47809760, len=177151, 
forceFlush=false], lastCheckpointId=39fa4858-66cb-4c88-9a1c-13a8625e1158]
[2018-02-15 12:17:46,389][WARN 
][exchange-worker-#62][GridCacheDatabaseSharedManager] Ignite node stopped in 
the middle of checkpoint. Will restore memory state and finish checkpoint on 
node start.
[2018-02-15 
12:17:46,448][ERROR][exchange-worker-#62][GridDhtPartitionsExchangeFuture] 
Failed to reinitialize local partitions (preloading will be stopped): 
GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=38, 
minorTopVer=0], discoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode 
[id=b8684b5c-29f5-41db-bedc-f4b4ee4cab6b, addrs=[127.0.0.1, 172.25.1.49], 
sockAddrs=[lab49.gridgain.local/172.25.1.49:47500, /127.0.0.1:47500], 
discPort=47500, order=38, intOrder=37, lastExchangeTime=1518686253183, 
loc=true, ver=2.3.0#20180213-sha1:756ae8d4, isClient=false], topVer=38, 
nodeId8=b8684b5c, msg=null, type=NODE_JOINED, tstamp=1518686266006], 
nodeId=b8684b5c, evt=NODE_JOINED]
java.lang.ClassCastException: 
org.apache.ignite.internal.pagemem.impl.PageMemoryNoStoreImpl cannot be cast to 
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryEx
 at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.getPageMemoryForCacheGroup(GridCacheDatabaseSharedManager.java:1595)
 at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restoreMemory(GridCacheDatabaseSharedManager.java:1533)
 at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readCheckpointAndRestoreMemory(GridCacheDatabaseSharedManager.java:568)
 at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.initCachesOnLocalJoin(GridDhtPartitionsExchangeFuture.java:724)
 at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:611)
 at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2279)
 at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
 at java.lang.Thread.run(Thread.java:748)

{noformat}
This happens if node was killed during checkpoint (it seems only during the 
first one).

Load conifg:
 * Yardstick with CacheRandomOperationBenchmark
 * 12 client nodes, 24 server nodes, 12 hosts (2 per host). The issue is also 
reproduced when restarted node is 1 per host.
 * Several caches with different configs: pds/in memory, tx/atomic, 
with/without eviction etc. No dynamic caches. Complete configs are attached.
 * 1 node is restarted periodically.

Logs of restarted node are attached.

 

  was:
During failover test restarted node fails to start with the following exception:
{noformat}
[2018-02-15 12:17:46,388][INFO 
][exchange-worker-#62][GridCacheDatabaseSharedManager] Read checkpoint status 
[startMarker=/storage/ssd/krybakova/20181502-120637-2.3.0-SNAPSHOT-failover-756ae8d4-c12-s24-p200000-r400000-b2-d7200/yardstick/work/db/node00-730721d0-e532-4f3a-b9e9-29277c0b7a9a/cp/1518685946892-39fa4858-66cb-4c88-9a1c-13a8625e1158-START.bin,
 endMarker=null]
[2018-02-15 12:17:46,389][INFO 
][exchange-worker-#62][GridCacheDatabaseSharedManager] Checking memory state 
[lastValidPos=FileWALPointer [idx=0, fileOffset=0, len=0, forceFlush=false], 
lastMarked=FileWALPointer [idx=1, fileOffset=47809760, len=177151, 
forceFlush=false], lastCheckpointId=39fa4858-66cb-4c88-9a1c-13a8625e1158]
[2018-02-15 12:17:46,389][WARN 
][exchange-worker-#62][GridCacheDatabaseSharedManager] Ignite node stopped in 
the middle of checkpoint. Will restore memory state and finish checkpoint on 
node start.
[2018-02-15 
12:17:46,448][ERROR][exchange-worker-#62][GridDhtPartitionsExchangeFuture] 
Failed to reinitialize local partitions (preloading will be stopped): 
GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=38, 
minorTopVer=0], discoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode 
[id=b8684b5c-29f5-41db-bedc-f4b4ee4cab6b, addrs=[127.0.0.1, 172.25.1.49], 
sockAddrs=[lab49.gridgain.local/172.25.1.49:47500, /127.0.0.1:47500], 
discPort=47500, order=38, intOrder=37, lastExchangeTime=1518686253183, 
loc=true, ver=2.3.0#20180213-sha1:756ae8d4, isClient=false], topVer=38, 
nodeId8=b8684b5c, msg=null, type=NODE_JOINED, tstamp=1518686266006], 
nodeId=b8684b5c, evt=NODE_JOINED]
java.lang.ClassCastException: 
org.apache.ignite.internal.pagemem.impl.PageMemoryNoStoreImpl cannot be cast to 
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryEx
 at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.getPageMemoryForCacheGroup(GridCacheDatabaseSharedManager.java:1595)
 at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restoreMemory(GridCacheDatabaseSharedManager.java:1533)
 at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readCheckpointAndRestoreMemory(GridCacheDatabaseSharedManager.java:568)
 at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.initCachesOnLocalJoin(GridDhtPartitionsExchangeFuture.java:724)
 at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:611)
 at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2279)
 at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
 at java.lang.Thread.run(Thread.java:748)

{noformat}
This happens if node was killed during checkpoint (it seems only during the 
first one).

Load conifg:
 * Yardstick with CacheRandomOperationBenchmark
 * 12 client nodes, 24 server nodes, 12 hosts (2 per host). The issue is also 
reproduced when restarted node is 1 per host.
 * Several caches with different configs: pds/in memory, tx/atomic, 
with/without eviction etc. No dynamic caches. Complete configs are attached.
 * 1 node is restarted periodically.


> ClassCastException at restarted node if killed during checkpoint
> ----------------------------------------------------------------
>
>                 Key: IGNITE-7731
>                 URL: https://issues.apache.org/jira/browse/IGNITE-7731
>             Project: Ignite
>          Issue Type: Bug
>    Affects Versions: 2.3
>            Reporter: Ksenia Rybakova
>            Priority: Major
>         Attachments: 
> 120728_id11_172.25.1.49_cache-random-benchmark-2-backup.log, 
> 121729_id11-1_172.25.1.49cache-random-benchmark-2-backup.log, 
> ignite-base-load-config.xml, run-load.properties, run-load.xml
>
>
> During failover test restarted node fails to start with the following 
> exception:
> {noformat}
> [2018-02-15 12:17:46,388][INFO 
> ][exchange-worker-#62][GridCacheDatabaseSharedManager] Read checkpoint status 
> [startMarker=/storage/ssd/krybakova/20181502-120637-2.3.0-SNAPSHOT-failover-756ae8d4-c12-s24-p200000-r400000-b2-d7200/yardstick/work/db/node00-730721d0-e532-4f3a-b9e9-29277c0b7a9a/cp/1518685946892-39fa4858-66cb-4c88-9a1c-13a8625e1158-START.bin,
>  endMarker=null]
> [2018-02-15 12:17:46,389][INFO 
> ][exchange-worker-#62][GridCacheDatabaseSharedManager] Checking memory state 
> [lastValidPos=FileWALPointer [idx=0, fileOffset=0, len=0, forceFlush=false], 
> lastMarked=FileWALPointer [idx=1, fileOffset=47809760, len=177151, 
> forceFlush=false], lastCheckpointId=39fa4858-66cb-4c88-9a1c-13a8625e1158]
> [2018-02-15 12:17:46,389][WARN 
> ][exchange-worker-#62][GridCacheDatabaseSharedManager] Ignite node stopped in 
> the middle of checkpoint. Will restore memory state and finish checkpoint on 
> node start.
> [2018-02-15 
> 12:17:46,448][ERROR][exchange-worker-#62][GridDhtPartitionsExchangeFuture] 
> Failed to reinitialize local partitions (preloading will be stopped): 
> GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=38, 
> minorTopVer=0], discoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode 
> [id=b8684b5c-29f5-41db-bedc-f4b4ee4cab6b, addrs=[127.0.0.1, 172.25.1.49], 
> sockAddrs=[lab49.gridgain.local/172.25.1.49:47500, /127.0.0.1:47500], 
> discPort=47500, order=38, intOrder=37, lastExchangeTime=1518686253183, 
> loc=true, ver=2.3.0#20180213-sha1:756ae8d4, isClient=false], topVer=38, 
> nodeId8=b8684b5c, msg=null, type=NODE_JOINED, tstamp=1518686266006], 
> nodeId=b8684b5c, evt=NODE_JOINED]
> java.lang.ClassCastException: 
> org.apache.ignite.internal.pagemem.impl.PageMemoryNoStoreImpl cannot be cast 
> to 
> org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryEx
>  at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.getPageMemoryForCacheGroup(GridCacheDatabaseSharedManager.java:1595)
>  at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restoreMemory(GridCacheDatabaseSharedManager.java:1533)
>  at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readCheckpointAndRestoreMemory(GridCacheDatabaseSharedManager.java:568)
>  at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.initCachesOnLocalJoin(GridDhtPartitionsExchangeFuture.java:724)
>  at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:611)
>  at 
> org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2279)
>  at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
>  at java.lang.Thread.run(Thread.java:748)
> {noformat}
> This happens if node was killed during checkpoint (it seems only during the 
> first one).
> Load conifg:
>  * Yardstick with CacheRandomOperationBenchmark
>  * 12 client nodes, 24 server nodes, 12 hosts (2 per host). The issue is also 
> reproduced when restarted node is 1 per host.
>  * Several caches with different configs: pds/in memory, tx/atomic, 
> with/without eviction etc. No dynamic caches. Complete configs are attached.
>  * 1 node is restarted periodically.
> Logs of restarted node are attached.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to