[ 
https://issues.apache.org/jira/browse/IGNITE-8066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy Pavlov updated IGNITE-8066:
-----------------------------------
    Description: 
1) On activation grid read checkpoint status with segment idx=7742:
{noformat}
2018-03-21 02:34:04.465[INFO 
]exchange-worker-#152%DPL_GRID%DplGridNodeName%[o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
 Successfully activated caches [nodeId=9c0c2e76-fb7f-46df-8b0b-3379d0c91db9, 
clie
nt=false, topVer=AffinityTopologyVersion [topVer=161, minorTopVer=1]]
2018-03-21 02:34:04.479[INFO 
]exchange-worker-#152%DPL_GRID%DplGridNodeName%[o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
 Finished waiting for partition release future [topVer=AffinityTopologyVersion 
[t
opVer=161, minorTopVer=1], waitTime=0ms, futInfo=NA]
2018-03-21 02:34:04.487[INFO 
]exchange-worker-#152%DPL_GRID%DplGridNodeName%[o.a.i.i.p.c.p.GridCacheDatabaseSharedManager]
 Read checkpoint status 
[startMarker=/gridgain/ssd/data/10_126_1_172_47500/cp/15215870
60132-aafbf88b-f783-40e8-8e3c-ef60cd383e21-START.bin, 
endMarker=/gridgain/ssd/data/10_126_1_172_47500/cp/1521587060132-aafbf88b-f783-40e8-8e3c-ef60cd383e21-END.bin]
2018-03-21 02:34:04.488[INFO 
]exchange-worker-#152%DPL_GRID%DplGridNodeName%[o.a.i.i.p.c.p.GridCacheDatabaseSharedManager]
 Applying lost cache updates since last checkpoint record 
[lastMarked=FileWALPointer [
idx=7742, fileOff=1041057120, len=1470746], 
lastCheckpointId=aafbf88b-f783-40e8-8e3c-ef60cd383e21]
{noformat}
2) but right after it (with only two metrics messages in log between it) write 
checkpoint with wal segment idx=0
 {noformat}
2018-03-21 02:35:21.875[INFO 
]exchange-worker-#152%DPL_GRID%DplGridNodeName%[o.a.i.i.p.c.p.GridCacheDatabaseSharedManager]
 Finished applying WAL changes [updatesApplied=0, time=77388ms]
2018-03-21 02:35:22.386[INFO 
]db-checkpoint-thread-#243%DPL_GRID%DplGridNodeName%[o.a.i.i.p.c.p.GridCacheDatabaseSharedManager]
 Checkpoint started [checkpointId=8cf946e6-a718-4388-8bef-c76bf79d93cd, 
startPtr=
FileWALPointer [idx=0, fileOff=77196029, len=450864], checkpointLockWait=0ms, 
checkpointLockHoldTime=422ms, pages=16379, reason='node started']
2018-03-21 02:35:25.934[INFO 
]db-checkpoint-thread-#243%DPL_GRID%DplGridNodeName%[o.a.i.i.p.c.p.GridCacheDatabaseSharedManager]
 Checkpoint finished [cpId=8cf946e6-a718-4388-8bef-c76bf79d93cd, pages=16379, 
mar
kPos=FileWALPointer [idx=0, fileOff=77196029, len=450864], 
walSegmentsCleared=0, markDuration=508ms, pagesWrite=155ms, fsync=3391ms, 
total=4054ms] 
{noformat}
Then we get some AssertionError while trying to archive wal segment 0 when 
lastArchivedIdx=7742

  was:
1) On activation grid read checkpoint status with segment idx=7742:

2018-03-21 02:34:04.465[INFO 
][exchange-worker-#152%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
 Successfully activated caches [nodeId=9c0c2e76-fb7f-46df-8b0b-3379d0c91db9, 
clie
nt=false, topVer=AffinityTopologyVersion [topVer=161, minorTopVer=1]]
2018-03-21 02:34:04.479[INFO 
][exchange-worker-#152%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
 Finished waiting for partition release future [topVer=AffinityTopologyVersion 
[t
opVer=161, minorTopVer=1], waitTime=0ms, futInfo=NA]
2018-03-21 02:34:04.487[INFO 
][exchange-worker-#152%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.p.GridCacheDatabaseSharedManager]
 Read checkpoint status 
[startMarker=/gridgain/ssd/data/10_126_1_172_47500/cp/15215870
60132-aafbf88b-f783-40e8-8e3c-ef60cd383e21-START.bin, 
endMarker=/gridgain/ssd/data/10_126_1_172_47500/cp/1521587060132-aafbf88b-f783-40e8-8e3c-ef60cd383e21-END.bin]
2018-03-21 02:34:04.488[INFO 
][exchange-worker-#152%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.p.GridCacheDatabaseSharedManager]
 Applying lost cache updates since last checkpoint record 
[lastMarked=FileWALPointer [
idx=7742, fileOff=1041057120, len=1470746], 
lastCheckpointId=aafbf88b-f783-40e8-8e3c-ef60cd383e21]

2) but right after it (with only two metrics messages in log between it) write 
checkpoint with wal segment idx=0

2018-03-21 02:35:21.875[INFO 
][exchange-worker-#152%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.p.GridCacheDatabaseSharedManager]
 Finished applying WAL changes [updatesApplied=0, time=77388ms]
2018-03-21 02:35:22.386[INFO 
][db-checkpoint-thread-#243%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.p.GridCacheDatabaseSharedManager]
 Checkpoint started [checkpointId=8cf946e6-a718-4388-8bef-c76bf79d93cd, 
startPtr=
FileWALPointer [idx=0, fileOff=77196029, len=450864], checkpointLockWait=0ms, 
checkpointLockHoldTime=422ms, pages=16379, reason='node started']
2018-03-21 02:35:25.934[INFO 
][db-checkpoint-thread-#243%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.p.GridCacheDatabaseSharedManager]
 Checkpoint finished [cpId=8cf946e6-a718-4388-8bef-c76bf79d93cd, pages=16379, 
mar
kPos=FileWALPointer [idx=0, fileOff=77196029, len=450864], 
walSegmentsCleared=0, markDuration=508ms, pagesWrite=155ms, fsync=3391ms, 
total=4054ms]

Then we get some AssertionError while trying to archive wal segment 0 when 
lastArchivedIdx=7742


> Reset wal segment idx
> ---------------------
>
>                 Key: IGNITE-8066
>                 URL: https://issues.apache.org/jira/browse/IGNITE-8066
>             Project: Ignite
>          Issue Type: Bug
>          Components: general
>    Affects Versions: 2.4
>            Reporter: Alexander Belyak
>            Assignee: Stanilovsky Evgeny
>            Priority: Major
>             Fix For: 2.5
>
>         Attachments: tc.png
>
>
> 1) On activation grid read checkpoint status with segment idx=7742:
> {noformat}
> 2018-03-21 02:34:04.465[INFO 
> ]exchange-worker-#152%DPL_GRID%DplGridNodeName%[o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
>  Successfully activated caches [nodeId=9c0c2e76-fb7f-46df-8b0b-3379d0c91db9, 
> clie
> nt=false, topVer=AffinityTopologyVersion [topVer=161, minorTopVer=1]]
> 2018-03-21 02:34:04.479[INFO 
> ]exchange-worker-#152%DPL_GRID%DplGridNodeName%[o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
>  Finished waiting for partition release future 
> [topVer=AffinityTopologyVersion [t
> opVer=161, minorTopVer=1], waitTime=0ms, futInfo=NA]
> 2018-03-21 02:34:04.487[INFO 
> ]exchange-worker-#152%DPL_GRID%DplGridNodeName%[o.a.i.i.p.c.p.GridCacheDatabaseSharedManager]
>  Read checkpoint status 
> [startMarker=/gridgain/ssd/data/10_126_1_172_47500/cp/15215870
> 60132-aafbf88b-f783-40e8-8e3c-ef60cd383e21-START.bin, 
> endMarker=/gridgain/ssd/data/10_126_1_172_47500/cp/1521587060132-aafbf88b-f783-40e8-8e3c-ef60cd383e21-END.bin]
> 2018-03-21 02:34:04.488[INFO 
> ]exchange-worker-#152%DPL_GRID%DplGridNodeName%[o.a.i.i.p.c.p.GridCacheDatabaseSharedManager]
>  Applying lost cache updates since last checkpoint record 
> [lastMarked=FileWALPointer [
> idx=7742, fileOff=1041057120, len=1470746], 
> lastCheckpointId=aafbf88b-f783-40e8-8e3c-ef60cd383e21]
> {noformat}
> 2) but right after it (with only two metrics messages in log between it) 
> write checkpoint with wal segment idx=0
>  {noformat}
> 2018-03-21 02:35:21.875[INFO 
> ]exchange-worker-#152%DPL_GRID%DplGridNodeName%[o.a.i.i.p.c.p.GridCacheDatabaseSharedManager]
>  Finished applying WAL changes [updatesApplied=0, time=77388ms]
> 2018-03-21 02:35:22.386[INFO 
> ]db-checkpoint-thread-#243%DPL_GRID%DplGridNodeName%[o.a.i.i.p.c.p.GridCacheDatabaseSharedManager]
>  Checkpoint started [checkpointId=8cf946e6-a718-4388-8bef-c76bf79d93cd, 
> startPtr=
> FileWALPointer [idx=0, fileOff=77196029, len=450864], checkpointLockWait=0ms, 
> checkpointLockHoldTime=422ms, pages=16379, reason='node started']
> 2018-03-21 02:35:25.934[INFO 
> ]db-checkpoint-thread-#243%DPL_GRID%DplGridNodeName%[o.a.i.i.p.c.p.GridCacheDatabaseSharedManager]
>  Checkpoint finished [cpId=8cf946e6-a718-4388-8bef-c76bf79d93cd, pages=16379, 
> mar
> kPos=FileWALPointer [idx=0, fileOff=77196029, len=450864], 
> walSegmentsCleared=0, markDuration=508ms, pagesWrite=155ms, fsync=3391ms, 
> total=4054ms] 
> {noformat}
> Then we get some AssertionError while trying to archive wal segment 0 when 
> lastArchivedIdx=7742



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to