[ 
https://issues.apache.org/jira/browse/IGNITE-11704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16988954#comment-16988954
 ] 

Alexei Scherbakov commented on IGNITE-11704:
--------------------------------------------

Better to store a full log in case of TC history cleanup.

{noformat}
java.lang.AssertionError: Failed to wait for tombstone cleanup: 
distributed.CacheRemoveWithTombstonesLoadTest2 expected:<0> but was:<1>
        at 
org.apache.ignite.internal.processors.cache.distributed.CacheRemoveWithTombstonesLoadTest.waitTombstoneCleanup(CacheRemoveWithTombstonesLoadTest.java:335)
        at 
org.apache.ignite.internal.processors.cache.distributed.CacheRemoveWithTombstonesLoadTest.removeAndRebalance(CacheRemoveWithTombstonesLoadTest.java:250)
------- Stdout: -------
[12:28:21]    __________  ________________ 
[12:28:21]   /  _/ ___/ |/ /  _/_  __/ __/ 
[12:28:21]  _/ // (7 7    // /  / / / _/   
[12:28:21] /___/\___/_/|_/___/ /_/ /___/  
[12:28:21] 
[12:28:21] ver. 2.8.0-SNAPSHOT#20191203-sha1:DEV
[12:28:21] 2019 Copyright(C) Apache Software Foundation
[12:28:21] 
[12:28:21] Ignite documentation: http://ignite.apache.org
[12:28:21] 
[12:28:21] Quiet mode.
[12:28:21]   ^-- Logging by 'GridTestLog4jLogger [quiet=true, config=null]'
[12:28:21]   ^-- To see **FULL** console log here add -DIGNITE_QUIET=false or 
"-v" to ignite.{sh|bat}
[12:28:21] 
[12:28:21] OS: Linux 4.15.0-54-generic amd64
[12:28:21] VM information: Java(TM) SE Runtime Environment 1.8.0_212-b10 Oracle 
Corporation Java HotSpot(TM) 64-Bit Server VM 25.212-b10
[12:28:21] Configured plugins:
[12:28:21]   ^-- StanByClusterTestProvider 1.0
[12:28:21]   ^-- null
[12:28:21] 
[12:28:21]   ^-- PageMemory tracker plugin 1.0
[12:28:21]   ^-- 
[12:28:21] 
[12:28:21]   ^-- TestDistibutedConfigurationPlugin 1.0
[12:28:21]   ^-- 
[12:28:21] 
[12:28:21]   ^-- NodeValidationPluginProvider 1.0
[12:28:21]   ^-- 
[12:28:21] 
[12:28:21] Configured failure handler: [hnd=NoOpFailureHandler 
[super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet 
[SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]]]
[12:28:21] Message queue limit is set to 0 which may lead to potential OOMEs 
when running cache operations in FULL_ASYNC or PRIMARY_SYNC modes due to 
message queues growth on sender and receiver sides.
[12:28:21] Security status [authentication=off, tls/ssl=off]
[12:28:21] To start Console Management & Monitoring run ignitevisorcmd.{sh|bat}
[12:28:21] Data Regions Configured:
[12:28:21]   ^-- default [initSize=256.0 MiB, maxSize=18.9 GiB, 
persistence=false, lazyMemoryAllocation=true]
[12:28:21] 
[12:28:21] Ignite node started OK (id=a34d6c79, instance 
name=distributed.CacheRemoveWithTombstonesLoadTest0)
[12:28:21] Topology snapshot [ver=1, locNode=a34d6c79, servers=1, clients=0, 
state=ACTIVE, CPUs=5, offheap=19.0GB, heap=2.0GB]
[12:28:23]    __________  ________________ 
[12:28:23]   /  _/ ___/ |/ /  _/_  __/ __/ 
[12:28:23]  _/ // (7 7    // /  / / / _/   
[12:28:23] /___/\___/_/|_/___/ /_/ /___/  
[12:28:23] 
[12:28:23] ver. 2.8.0-SNAPSHOT#20191203-sha1:DEV
[12:28:23] 2019 Copyright(C) Apache Software Foundation
[12:28:23] 
[12:28:23] Ignite documentation: http://ignite.apache.org
[12:28:23] 
[12:28:23] Quiet mode.
[12:28:23]   ^-- Logging by 'GridTestLog4jLogger [quiet=true, config=null]'
[12:28:23]   ^-- To see **FULL** console log here add -DIGNITE_QUIET=false or 
"-v" to ignite.{sh|bat}
[12:28:23] 
[12:28:23] OS: Linux 4.15.0-54-generic amd64
[12:28:23] VM information: Java(TM) SE Runtime Environment 1.8.0_212-b10 Oracle 
Corporation Java HotSpot(TM) 64-Bit Server VM 25.212-b10
[12:28:23] Configured plugins:
[12:28:23]   ^-- StanByClusterTestProvider 1.0
[12:28:23]   ^-- null
[12:28:23] 
[12:28:23]   ^-- PageMemory tracker plugin 1.0
[12:28:23]   ^-- 
[12:28:23] 
[12:28:23]   ^-- TestDistibutedConfigurationPlugin 1.0
[12:28:23]   ^-- 
[12:28:23] 
[12:28:23]   ^-- NodeValidationPluginProvider 1.0
[12:28:23]   ^-- 
[12:28:23] 
[12:28:23] Configured failure handler: [hnd=NoOpFailureHandler 
[super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet 
[SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]]]
[12:28:23] Message queue limit is set to 0 which may lead to potential OOMEs 
when running cache operations in FULL_ASYNC or PRIMARY_SYNC modes due to 
message queues growth on sender and receiver sides.
[12:28:23] Security status [authentication=off, tls/ssl=off]
[12:28:23] Joining node doesn't have encryption data 
[node=3205dca1-2c61-4f76-8475-e0c5c0f00001]
[12:28:23] Topology snapshot [ver=2, locNode=a34d6c79, servers=2, clients=0, 
state=ACTIVE, CPUs=5, offheap=19.0GB, heap=2.0GB]
[12:28:29] To start Console Management & Monitoring run ignitevisorcmd.{sh|bat}
[12:28:29] Data Regions Configured:
[12:28:29]   ^-- default [initSize=256.0 MiB, maxSize=18.9 GiB, 
persistence=false, lazyMemoryAllocation=true]
[12:28:29] 
[12:28:29] Ignite node started OK (id=3205dca1, instance 
name=distributed.CacheRemoveWithTombstonesLoadTest1)
[12:28:29] Topology snapshot [ver=2, locNode=3205dca1, servers=2, clients=0, 
state=ACTIVE, CPUs=5, offheap=19.0GB, heap=2.0GB]
[12:28:31]    __________  ________________ 
[12:28:31]   /  _/ ___/ |/ /  _/_  __/ __/ 
[12:28:31]  _/ // (7 7    // /  / / / _/   
[12:28:31] /___/\___/_/|_/___/ /_/ /___/  
[12:28:31] 
[12:28:31] ver. 2.8.0-SNAPSHOT#20191203-sha1:DEV
[12:28:31] 2019 Copyright(C) Apache Software Foundation
[12:28:31] 
[12:28:31] Ignite documentation: http://ignite.apache.org
[12:28:31] 
[12:28:31] Quiet mode.
[12:28:31]   ^-- Logging by 'GridTestLog4jLogger [quiet=true, config=null]'
[12:28:31]   ^-- To see **FULL** console log here add -DIGNITE_QUIET=false or 
"-v" to ignite.{sh|bat}
[12:28:31] 
[12:28:31] OS: Linux 4.15.0-54-generic amd64
[12:28:31] VM information: Java(TM) SE Runtime Environment 1.8.0_212-b10 Oracle 
Corporation Java HotSpot(TM) 64-Bit Server VM 25.212-b10
[12:28:31] Configured plugins:
[12:28:31]   ^-- StanByClusterTestProvider 1.0
[12:28:31]   ^-- null
[12:28:31] 
[12:28:31]   ^-- PageMemory tracker plugin 1.0
[12:28:31]   ^-- 
[12:28:31] 
[12:28:31]   ^-- TestDistibutedConfigurationPlugin 1.0
[12:28:31]   ^-- 
[12:28:31] 
[12:28:31]   ^-- NodeValidationPluginProvider 1.0
[12:28:31]   ^-- 
[12:28:31] 
[12:28:31] Configured failure handler: [hnd=NoOpFailureHandler 
[super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet 
[SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]]]
[12:28:31] Message queue limit is set to 0 which may lead to potential OOMEs 
when running cache operations in FULL_ASYNC or PRIMARY_SYNC modes due to 
message queues growth on sender and receiver sides.
[12:28:31] Security status [authentication=off, tls/ssl=off]
[12:28:31] Joining node doesn't have encryption data 
[node=29f97d83-c488-44cc-95ad-cf9777a00002]
[12:28:31] Topology snapshot [ver=3, locNode=a34d6c79, servers=3, clients=0, 
state=ACTIVE, CPUs=5, offheap=19.0GB, heap=2.0GB]
[12:28:31] Topology snapshot [ver=3, locNode=3205dca1, servers=3, clients=0, 
state=ACTIVE, CPUs=5, offheap=19.0GB, heap=2.0GB]
[12:28:34] To start Console Management & Monitoring run ignitevisorcmd.{sh|bat}
[12:28:34] Data Regions Configured:
[12:28:34]   ^-- default [initSize=256.0 MiB, maxSize=18.9 GiB, 
persistence=false, lazyMemoryAllocation=true]
[12:28:34] 
[12:28:34] Ignite node started OK (id=29f97d83, instance 
name=distributed.CacheRemoveWithTombstonesLoadTest2)
[12:28:34] Topology snapshot [ver=3, locNode=29f97d83, servers=3, clients=0, 
state=ACTIVE, CPUs=5, offheap=19.0GB, heap=2.0GB]
[12:29:06] Topology snapshot [ver=4, locNode=3205dca1, servers=2, clients=0, 
state=ACTIVE, CPUs=5, offheap=19.0GB, heap=2.0GB]
[12:29:06] Coordinator changed [prev=TcpDiscoveryNode 
[id=a34d6c79-3bd7-4748-83da-51805b700000, 
consistentId=distributed.CacheRemoveWithTombstonesLoadTest0, addrs=ArrayList 
[127.0.0.1], sockAddrs=HashSet [/127.0.0.1:47500], discPort=47500, order=1, 
intOrder=1, lastExchangeTime=1575365303240, loc=false, 
ver=2.8.0#20191203-sha1:00000000, isClient=false], cur=TcpDiscoveryNode 
[id=3205dca1-2c61-4f76-8475-e0c5c0f00001, 
consistentId=distributed.CacheRemoveWithTombstonesLoadTest1, addrs=ArrayList 
[127.0.0.1], sockAddrs=HashSet [/127.0.0.1:47501], discPort=47501, order=2, 
intOrder=2, lastExchangeTime=1575365346714, loc=true, 
ver=2.8.0#20191203-sha1:00000000, isClient=false]]
[12:29:06] Topology snapshot [ver=4, locNode=29f97d83, servers=2, clients=0, 
state=ACTIVE, CPUs=5, offheap=19.0GB, heap=2.0GB]
[12:29:06] Coordinator changed [prev=TcpDiscoveryNode 
[id=a34d6c79-3bd7-4748-83da-51805b700000, 
consistentId=distributed.CacheRemoveWithTombstonesLoadTest0, addrs=ArrayList 
[127.0.0.1], sockAddrs=HashSet [/127.0.0.1:47500], discPort=47500, order=1, 
intOrder=1, lastExchangeTime=1575365311262, loc=false, 
ver=2.8.0#20191203-sha1:00000000, isClient=false], cur=TcpDiscoveryNode 
[id=3205dca1-2c61-4f76-8475-e0c5c0f00001, 
consistentId=distributed.CacheRemoveWithTombstonesLoadTest1, addrs=ArrayList 
[127.0.0.1], sockAddrs=HashSet [/127.0.0.1:47501], discPort=47501, order=2, 
intOrder=2, lastExchangeTime=1575365311262, loc=false, 
ver=2.8.0#20191203-sha1:00000000, isClient=false]]
[12:29:06] Ignite node stopped OK 
[name=distributed.CacheRemoveWithTombstonesLoadTest0, uptime=00:00:44.905]
[12:29:06] Topology snapshot [ver=5, locNode=29f97d83, servers=1, clients=0, 
state=ACTIVE, CPUs=5, offheap=19.0GB, heap=2.0GB]
[12:29:06] Coordinator changed [prev=TcpDiscoveryNode 
[id=3205dca1-2c61-4f76-8475-e0c5c0f00001, 
consistentId=distributed.CacheRemoveWithTombstonesLoadTest1, addrs=ArrayList 
[127.0.0.1], sockAddrs=HashSet [/127.0.0.1:47501], discPort=47501, order=2, 
intOrder=2, lastExchangeTime=1575365311262, loc=false, 
ver=2.8.0#20191203-sha1:00000000, isClient=false], cur=TcpDiscoveryNode 
[id=29f97d83-c488-44cc-95ad-cf9777a00002, 
consistentId=distributed.CacheRemoveWithTombstonesLoadTest2, addrs=ArrayList 
[127.0.0.1], sockAddrs=HashSet [/127.0.0.1:47502], discPort=47502, order=3, 
intOrder=3, lastExchangeTime=1575365346765, loc=true, 
ver=2.8.0#20191203-sha1:00000000, isClient=false]]
[12:29:06] Ignite node stopped OK 
[name=distributed.CacheRemoveWithTombstonesLoadTest1, uptime=00:00:37.249]
[12:29:06] Ignite node stopped OK 
[name=distributed.CacheRemoveWithTombstonesLoadTest2, uptime=00:00:32.485]
------- Stderr: -------
[2019-12-03 12:28:21,801][WARN 
][test-runner-#74792%distributed.CacheRemoveWithTombstonesLoadTest%][G] Ignite 
work directory is not provided, automatically resolved to: 
/opt/buildagent/work/7bc1c54bc719b67c/work
[2019-12-03 12:28:21,807][WARN 
][test-runner-#74792%distributed.CacheRemoveWithTombstonesLoadTest%][CacheRemoveWithTombstonesLoadTest0]
 Peer class loading is enabled (disable it in production for performance and 
deployment consistency reasons)
[2019-12-03 12:28:21,813][WARN 
][test-runner-#74792%distributed.CacheRemoveWithTombstonesLoadTest%][TcpCommunicationSpi]
 Message queue limit is set to 0 which may lead to potential OOMEs when running 
cache operations in FULL_ASYNC or PRIMARY_SYNC modes due to message queues 
growth on sender and receiver sides.
[2019-12-03 12:28:21,816][WARN 
][test-runner-#74792%distributed.CacheRemoveWithTombstonesLoadTest%][GridCollisionManager]
 Collision resolution is disabled (all jobs will be activated upon arrival).
[2019-12-03 12:28:21,821][WARN 
][test-runner-#74792%distributed.CacheRemoveWithTombstonesLoadTest%][IgniteCacheDatabaseSharedManager]
 DataRegionConfiguration.maxWalArchiveSize instead 
DataRegionConfiguration.walHistorySize would be used for removing old archive 
wal files
[2019-12-03 12:28:23,190][WARN ][async-callable-runner-1][G] Ignite work 
directory is not provided, automatically resolved to: 
/opt/buildagent/work/7bc1c54bc719b67c/work
[2019-12-03 12:28:23,198][WARN 
][async-callable-runner-1][CacheRemoveWithTombstonesLoadTest1] Peer class 
loading is enabled (disable it in production for performance and deployment 
consistency reasons)
[2019-12-03 12:28:23,209][WARN ][async-callable-runner-1][TcpCommunicationSpi] 
Message queue limit is set to 0 which may lead to potential OOMEs when running 
cache operations in FULL_ASYNC or PRIMARY_SYNC modes due to message queues 
growth on sender and receiver sides.
[2019-12-03 12:28:23,212][WARN ][async-callable-runner-1][GridCollisionManager] 
Collision resolution is disabled (all jobs will be activated upon arrival).
[2019-12-03 12:28:31,217][WARN ][async-callable-runner-1][G] Ignite work 
directory is not provided, automatically resolved to: 
/opt/buildagent/work/7bc1c54bc719b67c/work
[2019-12-03 12:28:31,227][WARN 
][async-callable-runner-1][CacheRemoveWithTombstonesLoadTest2] Peer class 
loading is enabled (disable it in production for performance and deployment 
consistency reasons)
[2019-12-03 12:28:31,234][WARN ][async-callable-runner-1][TcpCommunicationSpi] 
Message queue limit is set to 0 which may lead to potential OOMEs when running 
cache operations in FULL_ASYNC or PRIMARY_SYNC modes due to message queues 
growth on sender and receiver sides.
[2019-12-03 12:28:31,238][WARN ][async-callable-runner-1][GridCollisionManager] 
Collision resolution is disabled (all jobs will be activated upon arrival).
[2019-12-03 
12:28:34,240][ERROR][sys-#74938%distributed.CacheRemoveWithTombstonesLoadTest2%][IgniteTestResources]
 Critical system error detected. Will be handled accordingly to configured 
handler [hnd=NoOpFailureHandler [super=AbstractFailureHandler 
[ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, 
SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext 
[type=CRITICAL_ERROR, err=class 
o.a.i.i.processors.cache.persistence.tree.CorruptedTreeException: B+Tree is 
corrupted [pages(groupId, pageId)=[IgniteBiTuple [val1=1544803905, 
val2=844420635196573]], msg=Runtime failure on cursor iteration]]]
class 
org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException:
 B+Tree is corrupted [pages(groupId, pageId)=[IgniteBiTuple [val1=1544803905, 
val2=844420635196573]], msg=Runtime failure on cursor iteration]
        at 
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.corruptedTreeException(BPlusTree.java:5927)
        at 
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$AbstractForwardCursor.nextPage(BPlusTree.java:5438)
        at 
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$ForwardCursor.next(BPlusTree.java:5661)
        at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl$1.next(IgniteCacheOffheapManagerImpl.java:3020)
        at 
org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtLocalPartition$2.hasNextX(GridDhtLocalPartition.java:1226)
        at 
org.apache.ignite.internal.util.lang.GridIteratorAdapter.hasNext(GridIteratorAdapter.java:45)
        at 
org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtLocalPartition.doClear(GridDhtLocalPartition.java:1295)
        at 
org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtLocalPartition.clearTombstones(GridDhtLocalPartition.java:1242)
        at 
org.apache.ignite.internal.processors.cache.distributed.dht.topology.PartitionsEvictManager$ClearTombstonesTask.run0(PartitionsEvictManager.java:673)
        at 
org.apache.ignite.internal.processors.cache.distributed.dht.topology.PartitionsEvictManager$AbstractEvictionTask.run(PartitionsEvictManager.java:587)
        at 
org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:7061)
        at 
org.apache.ignite.internal.processors.closure.GridClosureProcessor$1.body(GridClosureProcessor.java:827)
        at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.AssertionError: Key is not ready: CacheDataRowAdapter 
[key=null, val=null, expireTime=-1, ver=null, cacheId=0, link=0001003c000077d8]
        at 
org.apache.ignite.internal.processors.cache.persistence.CacheDataRowAdapter.key(CacheDataRowAdapter.java:837)
        at 
org.apache.ignite.internal.processors.cache.tree.CacheDataTree.compare(CacheDataTree.java:382)
        at 
org.apache.ignite.internal.processors.cache.tree.CacheDataTree.compare(CacheDataTree.java:63)
        at 
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.compare(BPlusTree.java:5200)
        at 
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$AbstractForwardCursor.findLowerBound(BPlusTree.java:5317)
        at 
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$ForwardCursor.fillFromBuffer0(BPlusTree.java:5588)
        at 
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$AbstractForwardCursor.fillFromBuffer(BPlusTree.java:5376)
        at 
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$AbstractForwardCursor.nextPage(BPlusTree.java:5428)
        ... 14 more
[2019-12-03 
12:28:34,248][ERROR][sys-#74938%distributed.CacheRemoveWithTombstonesLoadTest2%][FailureProcessor]
 A critical problem with persistence data structures was detected. Please make 
backup of persistence storage and WAL files for further analysis. Persistence 
storage path: null WAL path: db/wal WAL archive path: db/wal/archive
[2019-12-03 12:28:34,250][WARN 
][sys-#74938%distributed.CacheRemoveWithTombstonesLoadTest2%][CacheDiagnosticManager]
 Page locks dump:

Thread=[name=rebalance-#74947%distributed.CacheRemoveWithTombstonesLoadTest2%, 
id=94924], state=TIMED_WAITING
Locked pages = []
Locked pages log: 
name=rebalance-#74947%distributed.CacheRemoveWithTombstonesLoadTest2% 
time=(1575365314249, 2019-12-03 12:28:34.249)


Thread=[name=rebalance-#74949%distributed.CacheRemoveWithTombstonesLoadTest2%, 
id=94926], state=TIMED_WAITING
Locked pages = []
Locked pages log: 
name=rebalance-#74949%distributed.CacheRemoveWithTombstonesLoadTest2% 
time=(1575365314249, 2019-12-03 12:28:34.249)


Thread=[name=rebalance-#74951%distributed.CacheRemoveWithTombstonesLoadTest2%, 
id=94928], state=TIMED_WAITING
Locked pages = []
Locked pages log: 
name=rebalance-#74951%distributed.CacheRemoveWithTombstonesLoadTest2% 
time=(1575365314249, 2019-12-03 12:28:34.249)


Thread=[name=rebalance-#74952%distributed.CacheRemoveWithTombstonesLoadTest2%, 
id=94929], state=TIMED_WAITING
Locked pages = []
Locked pages log: 
name=rebalance-#74952%distributed.CacheRemoveWithTombstonesLoadTest2% 
time=(1575365314249, 2019-12-03 12:28:34.249)


Thread=[name=sys-#74933%distributed.CacheRemoveWithTombstonesLoadTest2%, 
id=94910], state=TIMED_WAITING
Locked pages = []
Locked pages log: 
name=sys-#74933%distributed.CacheRemoveWithTombstonesLoadTest2% 
time=(1575365314249, 2019-12-03 12:28:34.249)


Thread=[name=sys-#74934%distributed.CacheRemoveWithTombstonesLoadTest2%, 
id=94911], state=TIMED_WAITING
Locked pages = []
Locked pages log: 
name=sys-#74934%distributed.CacheRemoveWithTombstonesLoadTest2% 
time=(1575365314249, 2019-12-03 12:28:34.249)


Thread=[name=sys-#74936%distributed.CacheRemoveWithTombstonesLoadTest2%, 
id=94913], state=TIMED_WAITING
Locked pages = []
Locked pages log: 
name=sys-#74936%distributed.CacheRemoveWithTombstonesLoadTest2% 
time=(1575365314249, 2019-12-03 12:28:34.249)


Thread=[name=sys-#74937%distributed.CacheRemoveWithTombstonesLoadTest2%, 
id=94914], state=TIMED_WAITING
Locked pages = []
Locked pages log: 
name=sys-#74937%distributed.CacheRemoveWithTombstonesLoadTest2% 
time=(1575365314249, 2019-12-03 12:28:34.249)


Thread=[name=sys-#74938%distributed.CacheRemoveWithTombstonesLoadTest2%, 
id=94915], state=RUNNABLE
Locked pages = []
Locked pages log: 
name=sys-#74938%distributed.CacheRemoveWithTombstonesLoadTest2% 
time=(1575365314249, 2019-12-03 12:28:34.249)


Thread=[name=sys-#74939%distributed.CacheRemoveWithTombstonesLoadTest2%, 
id=94916], state=TIMED_WAITING
Locked pages = []
Locked pages log: 
name=sys-#74939%distributed.CacheRemoveWithTombstonesLoadTest2% 
time=(1575365314249, 2019-12-03 12:28:34.249)


Thread=[name=sys-#74940%distributed.CacheRemoveWithTombstonesLoadTest2%, 
id=94917], state=TIMED_WAITING
Locked pages = []
Locked pages log: 
name=sys-#74940%distributed.CacheRemoveWithTombstonesLoadTest2% 
time=(1575365314249, 2019-12-03 12:28:34.249)


Thread=[name=sys-#74941%distributed.CacheRemoveWithTombstonesLoadTest2%, 
id=94918], state=TIMED_WAITING
Locked pages = []
Locked pages log: 
name=sys-#74941%distributed.CacheRemoveWithTombstonesLoadTest2% 
time=(1575365314249, 2019-12-03 12:28:34.249)


Thread=[name=sys-stripe-0-#74896%distributed.CacheRemoveWithTombstonesLoadTest2%,
 id=94862], state=WAITING
Locked pages = []
Locked pages log: 
name=sys-stripe-0-#74896%distributed.CacheRemoveWithTombstonesLoadTest2% 
time=(1575365314249, 2019-12-03 12:28:34.249)


Thread=[name=sys-stripe-1-#74897%distributed.CacheRemoveWithTombstonesLoadTest2%,
 id=94863], state=WAITING
Locked pages = []
Locked pages log: 
name=sys-stripe-1-#74897%distributed.CacheRemoveWithTombstonesLoadTest2% 
time=(1575365314249, 2019-12-03 12:28:34.249)


Thread=[name=sys-stripe-2-#74898%distributed.CacheRemoveWithTombstonesLoadTest2%,
 id=94864], state=WAITING
Locked pages = []
Locked pages log: 
name=sys-stripe-2-#74898%distributed.CacheRemoveWithTombstonesLoadTest2% 
time=(1575365314249, 2019-12-03 12:28:34.249)


Thread=[name=sys-stripe-3-#74899%distributed.CacheRemoveWithTombstonesLoadTest2%,
 id=94865], state=WAITING
Locked pages = []
Locked pages log: 
name=sys-stripe-3-#74899%distributed.CacheRemoveWithTombstonesLoadTest2% 
time=(1575365314249, 2019-12-03 12:28:34.249)


Thread=[name=sys-stripe-4-#74900%distributed.CacheRemoveWithTombstonesLoadTest2%,
 id=94866], state=WAITING
Locked pages = []
Locked pages log: 
name=sys-stripe-4-#74900%distributed.CacheRemoveWithTombstonesLoadTest2% 
time=(1575365314249, 2019-12-03 12:28:34.249)


Thread=[name=sys-stripe-5-#74901%distributed.CacheRemoveWithTombstonesLoadTest2%,
 id=94867], state=WAITING
Locked pages = []
Locked pages log: 
name=sys-stripe-5-#74901%distributed.CacheRemoveWithTombstonesLoadTest2% 
time=(1575365314249, 2019-12-03 12:28:34.249)


Thread=[name=sys-stripe-6-#74902%distributed.CacheRemoveWithTombstonesLoadTest2%,
 id=94868], state=WAITING
Locked pages = []
Locked pages log: 
name=sys-stripe-6-#74902%distributed.CacheRemoveWithTombstonesLoadTest2% 
time=(1575365314249, 2019-12-03 12:28:34.249)


Thread=[name=sys-stripe-7-#74903%distributed.CacheRemoveWithTombstonesLoadTest2%,
 id=94869], state=WAITING
Locked pages = []
Locked pages log: 
name=sys-stripe-7-#74903%distributed.CacheRemoveWithTombstonesLoadTest2% 
time=(1575365314249, 2019-12-03 12:28:34.249)



[2019-12-03 
12:28:34,261][ERROR][sys-#74938%distributed.CacheRemoveWithTombstonesLoadTest2%][PartitionsEvictManager]
 Partition eviction failed.
class org.apache.ignite.IgniteException: B+Tree is corrupted [pages(groupId, 
pageId)=[IgniteBiTuple [val1=1544803905, val2=844420635196573]], msg=Runtime 
failure on cursor iteration]
        at 
org.apache.ignite.internal.util.lang.GridIteratorAdapter.hasNext(GridIteratorAdapter.java:48)
        at 
org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtLocalPartition.doClear(GridDhtLocalPartition.java:1295)
        at 
org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtLocalPartition.clearTombstones(GridDhtLocalPartition.java:1242)
        at 
org.apache.ignite.internal.processors.cache.distributed.dht.topology.PartitionsEvictManager$ClearTombstonesTask.run0(PartitionsEvictManager.java:673)
        at 
org.apache.ignite.internal.processors.cache.distributed.dht.topology.PartitionsEvictManager$AbstractEvictionTask.run(PartitionsEvictManager.java:587)
        at 
org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:7061)
        at 
org.apache.ignite.internal.processors.closure.GridClosureProcessor$1.body(GridClosureProcessor.java:827)
        at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: class 
org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException:
 B+Tree is corrupted [pages(groupId, pageId)=[IgniteBiTuple [val1=1544803905, 
val2=844420635196573]], msg=Runtime failure on cursor iteration]
        at 
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.corruptedTreeException(BPlusTree.java:5927)
        at 
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$AbstractForwardCursor.nextPage(BPlusTree.java:5438)
        at 
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$ForwardCursor.next(BPlusTree.java:5661)
        at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl$1.next(IgniteCacheOffheapManagerImpl.java:3020)
        at 
org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtLocalPartition$2.hasNextX(GridDhtLocalPartition.java:1226)
        at 
org.apache.ignite.internal.util.lang.GridIteratorAdapter.hasNext(GridIteratorAdapter.java:45)
        ... 10 more
Caused by: java.lang.AssertionError: Key is not ready: CacheDataRowAdapter 
[key=null, val=null, expireTime=-1, ver=null, cacheId=0, link=0001003c000077d8]
        at 
org.apache.ignite.internal.processors.cache.persistence.CacheDataRowAdapter.key(CacheDataRowAdapter.java:837)
        at 
org.apache.ignite.internal.processors.cache.tree.CacheDataTree.compare(CacheDataTree.java:382)
        at 
org.apache.ignite.internal.processors.cache.tree.CacheDataTree.compare(CacheDataTree.java:63)
        at 
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.compare(BPlusTree.java:5200)
        at 
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$AbstractForwardCursor.findLowerBound(BPlusTree.java:5317)
        at 
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$ForwardCursor.fillFromBuffer0(BPlusTree.java:5588)
        at 
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$AbstractForwardCursor.fillFromBuffer(BPlusTree.java:5376)
        at 
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$AbstractForwardCursor.nextPage(BPlusTree.java:5428)
        ... 14 more
[2019-12-03 
12:28:34,268][ERROR][sys-#74938%distributed.CacheRemoveWithTombstonesLoadTest2%][IgniteTestResources]
 Critical system error detected. Will be handled accordingly to configured 
handler [hnd=NoOpFailureHandler [super=AbstractFailureHandler 
[ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, 
SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext 
[type=SYSTEM_WORKER_TERMINATION, err=class o.a.i.IgniteException: B+Tree is 
corrupted [pages(groupId, pageId)=[IgniteBiTuple [val1=1544803905, 
val2=844420635196573]], msg=Runtime failure on cursor iteration]]]
class org.apache.ignite.IgniteException: B+Tree is corrupted [pages(groupId, 
pageId)=[IgniteBiTuple [val1=1544803905, val2=844420635196573]], msg=Runtime 
failure on cursor iteration]
        at 
org.apache.ignite.internal.util.lang.GridIteratorAdapter.hasNext(GridIteratorAdapter.java:48)
        at 
org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtLocalPartition.doClear(GridDhtLocalPartition.java:1295)
        at 
org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtLocalPartition.clearTombstones(GridDhtLocalPartition.java:1242)
        at 
org.apache.ignite.internal.processors.cache.distributed.dht.topology.PartitionsEvictManager$ClearTombstonesTask.run0(PartitionsEvictManager.java:673)
        at 
org.apache.ignite.internal.processors.cache.distributed.dht.topology.PartitionsEvictManager$AbstractEvictionTask.run(PartitionsEvictManager.java:587)
        at 
org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:7061)
        at 
org.apache.ignite.internal.processors.closure.GridClosureProcessor$1.body(GridClosureProcessor.java:827)
        at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: class 
org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException:
 B+Tree is corrupted [pages(groupId, pageId)=[IgniteBiTuple [val1=1544803905, 
val2=844420635196573]], msg=Runtime failure on cursor iteration]
        at 
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.corruptedTreeException(BPlusTree.java:5927)
        at 
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$AbstractForwardCursor.nextPage(BPlusTree.java:5438)
        at 
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$ForwardCursor.next(BPlusTree.java:5661)
        at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl$1.next(IgniteCacheOffheapManagerImpl.java:3020)
        at 
org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtLocalPartition$2.hasNextX(GridDhtLocalPartition.java:1226)
        at 
org.apache.ignite.internal.util.lang.GridIteratorAdapter.hasNext(GridIteratorAdapter.java:45)
        ... 10 more
Caused by: java.lang.AssertionError: Key is not ready: CacheDataRowAdapter 
[key=null, val=null, expireTime=-1, ver=null, cacheId=0, link=0001003c000077d8]
        at 
org.apache.ignite.internal.processors.cache.persistence.CacheDataRowAdapter.key(CacheDataRowAdapter.java:837)
        at 
org.apache.ignite.internal.processors.cache.tree.CacheDataTree.compare(CacheDataTree.java:382)
        at 
org.apache.ignite.internal.processors.cache.tree.CacheDataTree.compare(CacheDataTree.java:63)
        at 
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.compare(BPlusTree.java:5200)
        at 
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$AbstractForwardCursor.findLowerBound(BPlusTree.java:5317)
        at 
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$ForwardCursor.fillFromBuffer0(BPlusTree.java:5588)
        at 
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$AbstractForwardCursor.fillFromBuffer(BPlusTree.java:5376)
        at 
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$AbstractForwardCursor.nextPage(BPlusTree.java:5428)
        ... 14 more
[2019-12-03 
12:28:34,275][ERROR][sys-#74938%distributed.CacheRemoveWithTombstonesLoadTest2%][FailureProcessor]
 A critical problem with persistence data structures was detected. Please make 
backup of persistence storage and WAL files for further analysis. Persistence 
storage path: null WAL path: db/wal WAL archive path: db/wal/archive
[2019-12-03 12:28:34,277][WARN 
][sys-#74938%distributed.CacheRemoveWithTombstonesLoadTest2%][CacheDiagnosticManager]
 Page locks dump:

Thread=[name=rebalance-#74947%distributed.CacheRemoveWithTombstonesLoadTest2%, 
id=94924], state=TIMED_WAITING
Locked pages = []
Locked pages log: 
name=rebalance-#74947%distributed.CacheRemoveWithTombstonesLoadTest2% 
time=(1575365314276, 2019-12-03 12:28:34.276)


Thread=[name=rebalance-#74949%distributed.CacheRemoveWithTombstonesLoadTest2%, 
id=94926], state=TIMED_WAITING
Locked pages = []
Locked pages log: 
name=rebalance-#74949%distributed.CacheRemoveWithTombstonesLoadTest2% 
time=(1575365314276, 2019-12-03 12:28:34.276)


Thread=[name=rebalance-#74951%distributed.CacheRemoveWithTombstonesLoadTest2%, 
id=94928], state=TIMED_WAITING
Locked pages = []
Locked pages log: 
name=rebalance-#74951%distributed.CacheRemoveWithTombstonesLoadTest2% 
time=(1575365314276, 2019-12-03 12:28:34.276)


Thread=[name=rebalance-#74952%distributed.CacheRemoveWithTombstonesLoadTest2%, 
id=94929], state=TIMED_WAITING
Locked pages = []
Locked pages log: 
name=rebalance-#74952%distributed.CacheRemoveWithTombstonesLoadTest2% 
time=(1575365314276, 2019-12-03 12:28:34.276)


Thread=[name=sys-#74933%distributed.CacheRemoveWithTombstonesLoadTest2%, 
id=94910], state=TIMED_WAITING
Locked pages = []
Locked pages log: 
name=sys-#74933%distributed.CacheRemoveWithTombstonesLoadTest2% 
time=(1575365314276, 2019-12-03 12:28:34.276)


Thread=[name=sys-#74934%distributed.CacheRemoveWithTombstonesLoadTest2%, 
id=94911], state=TIMED_WAITING
Locked pages = []
Locked pages log: 
name=sys-#74934%distributed.CacheRemoveWithTombstonesLoadTest2% 
time=(1575365314276, 2019-12-03 12:28:34.276)


Thread=[name=sys-#74936%distributed.CacheRemoveWithTombstonesLoadTest2%, 
id=94913], state=TIMED_WAITING
Locked pages = []
Locked pages log: 
name=sys-#74936%distributed.CacheRemoveWithTombstonesLoadTest2% 
time=(1575365314276, 2019-12-03 12:28:34.276)


Thread=[name=sys-#74937%distributed.CacheRemoveWithTombstonesLoadTest2%, 
id=94914], state=TIMED_WAITING
Locked pages = []
Locked pages log: 
name=sys-#74937%distributed.CacheRemoveWithTombstonesLoadTest2% 
time=(1575365314276, 2019-12-03 12:28:34.276)


Thread=[name=sys-#74938%distributed.CacheRemoveWithTombstonesLoadTest2%, 
id=94915], state=RUNNABLE
Locked pages = []
Locked pages log: 
name=sys-#74938%distributed.CacheRemoveWithTombstonesLoadTest2% 
time=(1575365314276, 2019-12-03 12:28:34.276)


Thread=[name=sys-#74939%distributed.CacheRemoveWithTombstonesLoadTest2%, 
id=94916], state=TIMED_WAITING
Locked pages = []
Locked pages log: 
name=sys-#74939%distributed.CacheRemoveWithTombstonesLoadTest2% 
time=(1575365314276, 2019-12-03 12:28:34.276)


Thread=[name=sys-#74940%distributed.CacheRemoveWithTombstonesLoadTest2%, 
id=94917], state=TIMED_WAITING
Locked pages = []
Locked pages log: 
name=sys-#74940%distributed.CacheRemoveWithTombstonesLoadTest2% 
time=(1575365314276, 2019-12-03 12:28:34.276)


Thread=[name=sys-#74941%distributed.CacheRemoveWithTombstonesLoadTest2%, 
id=94918], state=TIMED_WAITING
Locked pages = []
Locked pages log: 
name=sys-#74941%distributed.CacheRemoveWithTombstonesLoadTest2% 
time=(1575365314276, 2019-12-03 12:28:34.276)


Thread=[name=sys-stripe-0-#74896%distributed.CacheRemoveWithTombstonesLoadTest2%,
 id=94862], state=WAITING
Locked pages = []
Locked pages log: 
name=sys-stripe-0-#74896%distributed.CacheRemoveWithTombstonesLoadTest2% 
time=(1575365314276, 2019-12-03 12:28:34.276)


Thread=[name=sys-stripe-1-#74897%distributed.CacheRemoveWithTombstonesLoadTest2%,
 id=94863], state=WAITING
Locked pages = []
Locked pages log: 
name=sys-stripe-1-#74897%distributed.CacheRemoveWithTombstonesLoadTest2% 
time=(1575365314276, 2019-12-03 12:28:34.276)


Thread=[name=sys-stripe-2-#74898%distributed.CacheRemoveWithTombstonesLoadTest2%,
 id=94864], state=WAITING
Locked pages = []
Locked pages log: 
name=sys-stripe-2-#74898%distributed.CacheRemoveWithTombstonesLoadTest2% 
time=(1575365314276, 2019-12-03 12:28:34.276)


Thread=[name=sys-stripe-3-#74899%distributed.CacheRemoveWithTombstonesLoadTest2%,
 id=94865], state=WAITING
Locked pages = []
Locked pages log: 
name=sys-stripe-3-#74899%distributed.CacheRemoveWithTombstonesLoadTest2% 
time=(1575365314276, 2019-12-03 12:28:34.276)


Thread=[name=sys-stripe-4-#74900%distributed.CacheRemoveWithTombstonesLoadTest2%,
 id=94866], state=WAITING
Locked pages = []
Locked pages log: 
name=sys-stripe-4-#74900%distributed.CacheRemoveWithTombstonesLoadTest2% 
time=(1575365314276, 2019-12-03 12:28:34.276)


Thread=[name=sys-stripe-5-#74901%distributed.CacheRemoveWithTombstonesLoadTest2%,
 id=94867], state=WAITING
Locked pages = []
Locked pages log: 
name=sys-stripe-5-#74901%distributed.CacheRemoveWithTombstonesLoadTest2% 
time=(1575365314276, 2019-12-03 12:28:34.276)


Thread=[name=sys-stripe-6-#74902%distributed.CacheRemoveWithTombstonesLoadTest2%,
 id=94868], state=WAITING
Locked pages = []
Locked pages log: 
name=sys-stripe-6-#74902%distributed.CacheRemoveWithTombstonesLoadTest2% 
time=(1575365314276, 2019-12-03 12:28:34.276)


Thread=[name=sys-stripe-7-#74903%distributed.CacheRemoveWithTombstonesLoadTest2%,
 id=94869], state=WAITING
Locked pages = []
Locked pages log: 
name=sys-stripe-7-#74903%distributed.CacheRemoveWithTombstonesLoadTest2% 
time=(1575365314276, 2019-12-03 12:28:34.276)



[2019-12-03 12:29:06,839][ERROR][main][root] Test failed.
java.lang.AssertionError: Failed to wait for tombstone cleanup: 
distributed.CacheRemoveWithTombstonesLoadTest2 expected:<0> but was:<1>
        at org.junit.Assert.fail(Assert.java:88)
        at org.junit.Assert.failNotEquals(Assert.java:743)
        at org.junit.Assert.assertEquals(Assert.java:118)
        at org.junit.Assert.assertEquals(Assert.java:555)
        at 
org.apache.ignite.testframework.junits.JUnitAssertAware.assertEquals(JUnitAssertAware.java:114)
        at 
org.apache.ignite.internal.processors.cache.distributed.CacheRemoveWithTombstonesLoadTest.waitTombstoneCleanup(CacheRemoveWithTombstonesLoadTest.java:335)
        at 
org.apache.ignite.internal.processors.cache.distributed.CacheRemoveWithTombstonesLoadTest.removeAndRebalance(CacheRemoveWithTombstonesLoadTest.java:250)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
        at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
        at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
        at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
        at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
        at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
        at 
org.apache.ignite.testframework.junits.GridAbstractTest$7.run(GridAbstractTest.java:2090)
{noformat}

> Write tombstones during rebalance to get rid of deferred delete buffer
> ----------------------------------------------------------------------
>
>                 Key: IGNITE-11704
>                 URL: https://issues.apache.org/jira/browse/IGNITE-11704
>             Project: Ignite
>          Issue Type: Improvement
>            Reporter: Alexey Goncharuk
>            Priority: Major
>              Labels: rebalance
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently Ignite relies on deferred delete buffer in order to handle 
> write-remove conflicts during rebalance. Given the limit size of the buffer, 
> this approach is fundamentally flawed, especially in case when persistence is 
> enabled.
> I suggest to extend the logic of data storage to be able to store key 
> tombstones - to keep version for deleted entries. The tombstones will be 
> stored when rebalance is in progress and should be cleaned up when rebalance 
> is completed.
> Later this approach may be used to implement fast partition rebalance based 
> on merkle trees (in this case, tombstones should be written on an incomplete 
> baseline).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to