[
https://issues.apache.org/jira/browse/IGNITE-11704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16988954#comment-16988954
]
Alexei Scherbakov commented on IGNITE-11704:
--------------------------------------------
Better to store a full log in case of TC history cleanup.
{noformat}
java.lang.AssertionError: Failed to wait for tombstone cleanup:
distributed.CacheRemoveWithTombstonesLoadTest2 expected:<0> but was:<1>
at
org.apache.ignite.internal.processors.cache.distributed.CacheRemoveWithTombstonesLoadTest.waitTombstoneCleanup(CacheRemoveWithTombstonesLoadTest.java:335)
at
org.apache.ignite.internal.processors.cache.distributed.CacheRemoveWithTombstonesLoadTest.removeAndRebalance(CacheRemoveWithTombstonesLoadTest.java:250)
------- Stdout: -------
[12:28:21] __________ ________________
[12:28:21] / _/ ___/ |/ / _/_ __/ __/
[12:28:21] _/ // (7 7 // / / / / _/
[12:28:21] /___/\___/_/|_/___/ /_/ /___/
[12:28:21]
[12:28:21] ver. 2.8.0-SNAPSHOT#20191203-sha1:DEV
[12:28:21] 2019 Copyright(C) Apache Software Foundation
[12:28:21]
[12:28:21] Ignite documentation: http://ignite.apache.org
[12:28:21]
[12:28:21] Quiet mode.
[12:28:21] ^-- Logging by 'GridTestLog4jLogger [quiet=true, config=null]'
[12:28:21] ^-- To see **FULL** console log here add -DIGNITE_QUIET=false or
"-v" to ignite.{sh|bat}
[12:28:21]
[12:28:21] OS: Linux 4.15.0-54-generic amd64
[12:28:21] VM information: Java(TM) SE Runtime Environment 1.8.0_212-b10 Oracle
Corporation Java HotSpot(TM) 64-Bit Server VM 25.212-b10
[12:28:21] Configured plugins:
[12:28:21] ^-- StanByClusterTestProvider 1.0
[12:28:21] ^-- null
[12:28:21]
[12:28:21] ^-- PageMemory tracker plugin 1.0
[12:28:21] ^--
[12:28:21]
[12:28:21] ^-- TestDistibutedConfigurationPlugin 1.0
[12:28:21] ^--
[12:28:21]
[12:28:21] ^-- NodeValidationPluginProvider 1.0
[12:28:21] ^--
[12:28:21]
[12:28:21] Configured failure handler: [hnd=NoOpFailureHandler
[super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet
[SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]]]
[12:28:21] Message queue limit is set to 0 which may lead to potential OOMEs
when running cache operations in FULL_ASYNC or PRIMARY_SYNC modes due to
message queues growth on sender and receiver sides.
[12:28:21] Security status [authentication=off, tls/ssl=off]
[12:28:21] To start Console Management & Monitoring run ignitevisorcmd.{sh|bat}
[12:28:21] Data Regions Configured:
[12:28:21] ^-- default [initSize=256.0 MiB, maxSize=18.9 GiB,
persistence=false, lazyMemoryAllocation=true]
[12:28:21]
[12:28:21] Ignite node started OK (id=a34d6c79, instance
name=distributed.CacheRemoveWithTombstonesLoadTest0)
[12:28:21] Topology snapshot [ver=1, locNode=a34d6c79, servers=1, clients=0,
state=ACTIVE, CPUs=5, offheap=19.0GB, heap=2.0GB]
[12:28:23] __________ ________________
[12:28:23] / _/ ___/ |/ / _/_ __/ __/
[12:28:23] _/ // (7 7 // / / / / _/
[12:28:23] /___/\___/_/|_/___/ /_/ /___/
[12:28:23]
[12:28:23] ver. 2.8.0-SNAPSHOT#20191203-sha1:DEV
[12:28:23] 2019 Copyright(C) Apache Software Foundation
[12:28:23]
[12:28:23] Ignite documentation: http://ignite.apache.org
[12:28:23]
[12:28:23] Quiet mode.
[12:28:23] ^-- Logging by 'GridTestLog4jLogger [quiet=true, config=null]'
[12:28:23] ^-- To see **FULL** console log here add -DIGNITE_QUIET=false or
"-v" to ignite.{sh|bat}
[12:28:23]
[12:28:23] OS: Linux 4.15.0-54-generic amd64
[12:28:23] VM information: Java(TM) SE Runtime Environment 1.8.0_212-b10 Oracle
Corporation Java HotSpot(TM) 64-Bit Server VM 25.212-b10
[12:28:23] Configured plugins:
[12:28:23] ^-- StanByClusterTestProvider 1.0
[12:28:23] ^-- null
[12:28:23]
[12:28:23] ^-- PageMemory tracker plugin 1.0
[12:28:23] ^--
[12:28:23]
[12:28:23] ^-- TestDistibutedConfigurationPlugin 1.0
[12:28:23] ^--
[12:28:23]
[12:28:23] ^-- NodeValidationPluginProvider 1.0
[12:28:23] ^--
[12:28:23]
[12:28:23] Configured failure handler: [hnd=NoOpFailureHandler
[super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet
[SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]]]
[12:28:23] Message queue limit is set to 0 which may lead to potential OOMEs
when running cache operations in FULL_ASYNC or PRIMARY_SYNC modes due to
message queues growth on sender and receiver sides.
[12:28:23] Security status [authentication=off, tls/ssl=off]
[12:28:23] Joining node doesn't have encryption data
[node=3205dca1-2c61-4f76-8475-e0c5c0f00001]
[12:28:23] Topology snapshot [ver=2, locNode=a34d6c79, servers=2, clients=0,
state=ACTIVE, CPUs=5, offheap=19.0GB, heap=2.0GB]
[12:28:29] To start Console Management & Monitoring run ignitevisorcmd.{sh|bat}
[12:28:29] Data Regions Configured:
[12:28:29] ^-- default [initSize=256.0 MiB, maxSize=18.9 GiB,
persistence=false, lazyMemoryAllocation=true]
[12:28:29]
[12:28:29] Ignite node started OK (id=3205dca1, instance
name=distributed.CacheRemoveWithTombstonesLoadTest1)
[12:28:29] Topology snapshot [ver=2, locNode=3205dca1, servers=2, clients=0,
state=ACTIVE, CPUs=5, offheap=19.0GB, heap=2.0GB]
[12:28:31] __________ ________________
[12:28:31] / _/ ___/ |/ / _/_ __/ __/
[12:28:31] _/ // (7 7 // / / / / _/
[12:28:31] /___/\___/_/|_/___/ /_/ /___/
[12:28:31]
[12:28:31] ver. 2.8.0-SNAPSHOT#20191203-sha1:DEV
[12:28:31] 2019 Copyright(C) Apache Software Foundation
[12:28:31]
[12:28:31] Ignite documentation: http://ignite.apache.org
[12:28:31]
[12:28:31] Quiet mode.
[12:28:31] ^-- Logging by 'GridTestLog4jLogger [quiet=true, config=null]'
[12:28:31] ^-- To see **FULL** console log here add -DIGNITE_QUIET=false or
"-v" to ignite.{sh|bat}
[12:28:31]
[12:28:31] OS: Linux 4.15.0-54-generic amd64
[12:28:31] VM information: Java(TM) SE Runtime Environment 1.8.0_212-b10 Oracle
Corporation Java HotSpot(TM) 64-Bit Server VM 25.212-b10
[12:28:31] Configured plugins:
[12:28:31] ^-- StanByClusterTestProvider 1.0
[12:28:31] ^-- null
[12:28:31]
[12:28:31] ^-- PageMemory tracker plugin 1.0
[12:28:31] ^--
[12:28:31]
[12:28:31] ^-- TestDistibutedConfigurationPlugin 1.0
[12:28:31] ^--
[12:28:31]
[12:28:31] ^-- NodeValidationPluginProvider 1.0
[12:28:31] ^--
[12:28:31]
[12:28:31] Configured failure handler: [hnd=NoOpFailureHandler
[super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet
[SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]]]
[12:28:31] Message queue limit is set to 0 which may lead to potential OOMEs
when running cache operations in FULL_ASYNC or PRIMARY_SYNC modes due to
message queues growth on sender and receiver sides.
[12:28:31] Security status [authentication=off, tls/ssl=off]
[12:28:31] Joining node doesn't have encryption data
[node=29f97d83-c488-44cc-95ad-cf9777a00002]
[12:28:31] Topology snapshot [ver=3, locNode=a34d6c79, servers=3, clients=0,
state=ACTIVE, CPUs=5, offheap=19.0GB, heap=2.0GB]
[12:28:31] Topology snapshot [ver=3, locNode=3205dca1, servers=3, clients=0,
state=ACTIVE, CPUs=5, offheap=19.0GB, heap=2.0GB]
[12:28:34] To start Console Management & Monitoring run ignitevisorcmd.{sh|bat}
[12:28:34] Data Regions Configured:
[12:28:34] ^-- default [initSize=256.0 MiB, maxSize=18.9 GiB,
persistence=false, lazyMemoryAllocation=true]
[12:28:34]
[12:28:34] Ignite node started OK (id=29f97d83, instance
name=distributed.CacheRemoveWithTombstonesLoadTest2)
[12:28:34] Topology snapshot [ver=3, locNode=29f97d83, servers=3, clients=0,
state=ACTIVE, CPUs=5, offheap=19.0GB, heap=2.0GB]
[12:29:06] Topology snapshot [ver=4, locNode=3205dca1, servers=2, clients=0,
state=ACTIVE, CPUs=5, offheap=19.0GB, heap=2.0GB]
[12:29:06] Coordinator changed [prev=TcpDiscoveryNode
[id=a34d6c79-3bd7-4748-83da-51805b700000,
consistentId=distributed.CacheRemoveWithTombstonesLoadTest0, addrs=ArrayList
[127.0.0.1], sockAddrs=HashSet [/127.0.0.1:47500], discPort=47500, order=1,
intOrder=1, lastExchangeTime=1575365303240, loc=false,
ver=2.8.0#20191203-sha1:00000000, isClient=false], cur=TcpDiscoveryNode
[id=3205dca1-2c61-4f76-8475-e0c5c0f00001,
consistentId=distributed.CacheRemoveWithTombstonesLoadTest1, addrs=ArrayList
[127.0.0.1], sockAddrs=HashSet [/127.0.0.1:47501], discPort=47501, order=2,
intOrder=2, lastExchangeTime=1575365346714, loc=true,
ver=2.8.0#20191203-sha1:00000000, isClient=false]]
[12:29:06] Topology snapshot [ver=4, locNode=29f97d83, servers=2, clients=0,
state=ACTIVE, CPUs=5, offheap=19.0GB, heap=2.0GB]
[12:29:06] Coordinator changed [prev=TcpDiscoveryNode
[id=a34d6c79-3bd7-4748-83da-51805b700000,
consistentId=distributed.CacheRemoveWithTombstonesLoadTest0, addrs=ArrayList
[127.0.0.1], sockAddrs=HashSet [/127.0.0.1:47500], discPort=47500, order=1,
intOrder=1, lastExchangeTime=1575365311262, loc=false,
ver=2.8.0#20191203-sha1:00000000, isClient=false], cur=TcpDiscoveryNode
[id=3205dca1-2c61-4f76-8475-e0c5c0f00001,
consistentId=distributed.CacheRemoveWithTombstonesLoadTest1, addrs=ArrayList
[127.0.0.1], sockAddrs=HashSet [/127.0.0.1:47501], discPort=47501, order=2,
intOrder=2, lastExchangeTime=1575365311262, loc=false,
ver=2.8.0#20191203-sha1:00000000, isClient=false]]
[12:29:06] Ignite node stopped OK
[name=distributed.CacheRemoveWithTombstonesLoadTest0, uptime=00:00:44.905]
[12:29:06] Topology snapshot [ver=5, locNode=29f97d83, servers=1, clients=0,
state=ACTIVE, CPUs=5, offheap=19.0GB, heap=2.0GB]
[12:29:06] Coordinator changed [prev=TcpDiscoveryNode
[id=3205dca1-2c61-4f76-8475-e0c5c0f00001,
consistentId=distributed.CacheRemoveWithTombstonesLoadTest1, addrs=ArrayList
[127.0.0.1], sockAddrs=HashSet [/127.0.0.1:47501], discPort=47501, order=2,
intOrder=2, lastExchangeTime=1575365311262, loc=false,
ver=2.8.0#20191203-sha1:00000000, isClient=false], cur=TcpDiscoveryNode
[id=29f97d83-c488-44cc-95ad-cf9777a00002,
consistentId=distributed.CacheRemoveWithTombstonesLoadTest2, addrs=ArrayList
[127.0.0.1], sockAddrs=HashSet [/127.0.0.1:47502], discPort=47502, order=3,
intOrder=3, lastExchangeTime=1575365346765, loc=true,
ver=2.8.0#20191203-sha1:00000000, isClient=false]]
[12:29:06] Ignite node stopped OK
[name=distributed.CacheRemoveWithTombstonesLoadTest1, uptime=00:00:37.249]
[12:29:06] Ignite node stopped OK
[name=distributed.CacheRemoveWithTombstonesLoadTest2, uptime=00:00:32.485]
------- Stderr: -------
[2019-12-03 12:28:21,801][WARN
][test-runner-#74792%distributed.CacheRemoveWithTombstonesLoadTest%][G] Ignite
work directory is not provided, automatically resolved to:
/opt/buildagent/work/7bc1c54bc719b67c/work
[2019-12-03 12:28:21,807][WARN
][test-runner-#74792%distributed.CacheRemoveWithTombstonesLoadTest%][CacheRemoveWithTombstonesLoadTest0]
Peer class loading is enabled (disable it in production for performance and
deployment consistency reasons)
[2019-12-03 12:28:21,813][WARN
][test-runner-#74792%distributed.CacheRemoveWithTombstonesLoadTest%][TcpCommunicationSpi]
Message queue limit is set to 0 which may lead to potential OOMEs when running
cache operations in FULL_ASYNC or PRIMARY_SYNC modes due to message queues
growth on sender and receiver sides.
[2019-12-03 12:28:21,816][WARN
][test-runner-#74792%distributed.CacheRemoveWithTombstonesLoadTest%][GridCollisionManager]
Collision resolution is disabled (all jobs will be activated upon arrival).
[2019-12-03 12:28:21,821][WARN
][test-runner-#74792%distributed.CacheRemoveWithTombstonesLoadTest%][IgniteCacheDatabaseSharedManager]
DataRegionConfiguration.maxWalArchiveSize instead
DataRegionConfiguration.walHistorySize would be used for removing old archive
wal files
[2019-12-03 12:28:23,190][WARN ][async-callable-runner-1][G] Ignite work
directory is not provided, automatically resolved to:
/opt/buildagent/work/7bc1c54bc719b67c/work
[2019-12-03 12:28:23,198][WARN
][async-callable-runner-1][CacheRemoveWithTombstonesLoadTest1] Peer class
loading is enabled (disable it in production for performance and deployment
consistency reasons)
[2019-12-03 12:28:23,209][WARN ][async-callable-runner-1][TcpCommunicationSpi]
Message queue limit is set to 0 which may lead to potential OOMEs when running
cache operations in FULL_ASYNC or PRIMARY_SYNC modes due to message queues
growth on sender and receiver sides.
[2019-12-03 12:28:23,212][WARN ][async-callable-runner-1][GridCollisionManager]
Collision resolution is disabled (all jobs will be activated upon arrival).
[2019-12-03 12:28:31,217][WARN ][async-callable-runner-1][G] Ignite work
directory is not provided, automatically resolved to:
/opt/buildagent/work/7bc1c54bc719b67c/work
[2019-12-03 12:28:31,227][WARN
][async-callable-runner-1][CacheRemoveWithTombstonesLoadTest2] Peer class
loading is enabled (disable it in production for performance and deployment
consistency reasons)
[2019-12-03 12:28:31,234][WARN ][async-callable-runner-1][TcpCommunicationSpi]
Message queue limit is set to 0 which may lead to potential OOMEs when running
cache operations in FULL_ASYNC or PRIMARY_SYNC modes due to message queues
growth on sender and receiver sides.
[2019-12-03 12:28:31,238][WARN ][async-callable-runner-1][GridCollisionManager]
Collision resolution is disabled (all jobs will be activated upon arrival).
[2019-12-03
12:28:34,240][ERROR][sys-#74938%distributed.CacheRemoveWithTombstonesLoadTest2%][IgniteTestResources]
Critical system error detected. Will be handled accordingly to configured
handler [hnd=NoOpFailureHandler [super=AbstractFailureHandler
[ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED,
SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext
[type=CRITICAL_ERROR, err=class
o.a.i.i.processors.cache.persistence.tree.CorruptedTreeException: B+Tree is
corrupted [pages(groupId, pageId)=[IgniteBiTuple [val1=1544803905,
val2=844420635196573]], msg=Runtime failure on cursor iteration]]]
class
org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException:
B+Tree is corrupted [pages(groupId, pageId)=[IgniteBiTuple [val1=1544803905,
val2=844420635196573]], msg=Runtime failure on cursor iteration]
at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.corruptedTreeException(BPlusTree.java:5927)
at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$AbstractForwardCursor.nextPage(BPlusTree.java:5438)
at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$ForwardCursor.next(BPlusTree.java:5661)
at
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl$1.next(IgniteCacheOffheapManagerImpl.java:3020)
at
org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtLocalPartition$2.hasNextX(GridDhtLocalPartition.java:1226)
at
org.apache.ignite.internal.util.lang.GridIteratorAdapter.hasNext(GridIteratorAdapter.java:45)
at
org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtLocalPartition.doClear(GridDhtLocalPartition.java:1295)
at
org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtLocalPartition.clearTombstones(GridDhtLocalPartition.java:1242)
at
org.apache.ignite.internal.processors.cache.distributed.dht.topology.PartitionsEvictManager$ClearTombstonesTask.run0(PartitionsEvictManager.java:673)
at
org.apache.ignite.internal.processors.cache.distributed.dht.topology.PartitionsEvictManager$AbstractEvictionTask.run(PartitionsEvictManager.java:587)
at
org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:7061)
at
org.apache.ignite.internal.processors.closure.GridClosureProcessor$1.body(GridClosureProcessor.java:827)
at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.AssertionError: Key is not ready: CacheDataRowAdapter
[key=null, val=null, expireTime=-1, ver=null, cacheId=0, link=0001003c000077d8]
at
org.apache.ignite.internal.processors.cache.persistence.CacheDataRowAdapter.key(CacheDataRowAdapter.java:837)
at
org.apache.ignite.internal.processors.cache.tree.CacheDataTree.compare(CacheDataTree.java:382)
at
org.apache.ignite.internal.processors.cache.tree.CacheDataTree.compare(CacheDataTree.java:63)
at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.compare(BPlusTree.java:5200)
at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$AbstractForwardCursor.findLowerBound(BPlusTree.java:5317)
at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$ForwardCursor.fillFromBuffer0(BPlusTree.java:5588)
at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$AbstractForwardCursor.fillFromBuffer(BPlusTree.java:5376)
at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$AbstractForwardCursor.nextPage(BPlusTree.java:5428)
... 14 more
[2019-12-03
12:28:34,248][ERROR][sys-#74938%distributed.CacheRemoveWithTombstonesLoadTest2%][FailureProcessor]
A critical problem with persistence data structures was detected. Please make
backup of persistence storage and WAL files for further analysis. Persistence
storage path: null WAL path: db/wal WAL archive path: db/wal/archive
[2019-12-03 12:28:34,250][WARN
][sys-#74938%distributed.CacheRemoveWithTombstonesLoadTest2%][CacheDiagnosticManager]
Page locks dump:
Thread=[name=rebalance-#74947%distributed.CacheRemoveWithTombstonesLoadTest2%,
id=94924], state=TIMED_WAITING
Locked pages = []
Locked pages log:
name=rebalance-#74947%distributed.CacheRemoveWithTombstonesLoadTest2%
time=(1575365314249, 2019-12-03 12:28:34.249)
Thread=[name=rebalance-#74949%distributed.CacheRemoveWithTombstonesLoadTest2%,
id=94926], state=TIMED_WAITING
Locked pages = []
Locked pages log:
name=rebalance-#74949%distributed.CacheRemoveWithTombstonesLoadTest2%
time=(1575365314249, 2019-12-03 12:28:34.249)
Thread=[name=rebalance-#74951%distributed.CacheRemoveWithTombstonesLoadTest2%,
id=94928], state=TIMED_WAITING
Locked pages = []
Locked pages log:
name=rebalance-#74951%distributed.CacheRemoveWithTombstonesLoadTest2%
time=(1575365314249, 2019-12-03 12:28:34.249)
Thread=[name=rebalance-#74952%distributed.CacheRemoveWithTombstonesLoadTest2%,
id=94929], state=TIMED_WAITING
Locked pages = []
Locked pages log:
name=rebalance-#74952%distributed.CacheRemoveWithTombstonesLoadTest2%
time=(1575365314249, 2019-12-03 12:28:34.249)
Thread=[name=sys-#74933%distributed.CacheRemoveWithTombstonesLoadTest2%,
id=94910], state=TIMED_WAITING
Locked pages = []
Locked pages log:
name=sys-#74933%distributed.CacheRemoveWithTombstonesLoadTest2%
time=(1575365314249, 2019-12-03 12:28:34.249)
Thread=[name=sys-#74934%distributed.CacheRemoveWithTombstonesLoadTest2%,
id=94911], state=TIMED_WAITING
Locked pages = []
Locked pages log:
name=sys-#74934%distributed.CacheRemoveWithTombstonesLoadTest2%
time=(1575365314249, 2019-12-03 12:28:34.249)
Thread=[name=sys-#74936%distributed.CacheRemoveWithTombstonesLoadTest2%,
id=94913], state=TIMED_WAITING
Locked pages = []
Locked pages log:
name=sys-#74936%distributed.CacheRemoveWithTombstonesLoadTest2%
time=(1575365314249, 2019-12-03 12:28:34.249)
Thread=[name=sys-#74937%distributed.CacheRemoveWithTombstonesLoadTest2%,
id=94914], state=TIMED_WAITING
Locked pages = []
Locked pages log:
name=sys-#74937%distributed.CacheRemoveWithTombstonesLoadTest2%
time=(1575365314249, 2019-12-03 12:28:34.249)
Thread=[name=sys-#74938%distributed.CacheRemoveWithTombstonesLoadTest2%,
id=94915], state=RUNNABLE
Locked pages = []
Locked pages log:
name=sys-#74938%distributed.CacheRemoveWithTombstonesLoadTest2%
time=(1575365314249, 2019-12-03 12:28:34.249)
Thread=[name=sys-#74939%distributed.CacheRemoveWithTombstonesLoadTest2%,
id=94916], state=TIMED_WAITING
Locked pages = []
Locked pages log:
name=sys-#74939%distributed.CacheRemoveWithTombstonesLoadTest2%
time=(1575365314249, 2019-12-03 12:28:34.249)
Thread=[name=sys-#74940%distributed.CacheRemoveWithTombstonesLoadTest2%,
id=94917], state=TIMED_WAITING
Locked pages = []
Locked pages log:
name=sys-#74940%distributed.CacheRemoveWithTombstonesLoadTest2%
time=(1575365314249, 2019-12-03 12:28:34.249)
Thread=[name=sys-#74941%distributed.CacheRemoveWithTombstonesLoadTest2%,
id=94918], state=TIMED_WAITING
Locked pages = []
Locked pages log:
name=sys-#74941%distributed.CacheRemoveWithTombstonesLoadTest2%
time=(1575365314249, 2019-12-03 12:28:34.249)
Thread=[name=sys-stripe-0-#74896%distributed.CacheRemoveWithTombstonesLoadTest2%,
id=94862], state=WAITING
Locked pages = []
Locked pages log:
name=sys-stripe-0-#74896%distributed.CacheRemoveWithTombstonesLoadTest2%
time=(1575365314249, 2019-12-03 12:28:34.249)
Thread=[name=sys-stripe-1-#74897%distributed.CacheRemoveWithTombstonesLoadTest2%,
id=94863], state=WAITING
Locked pages = []
Locked pages log:
name=sys-stripe-1-#74897%distributed.CacheRemoveWithTombstonesLoadTest2%
time=(1575365314249, 2019-12-03 12:28:34.249)
Thread=[name=sys-stripe-2-#74898%distributed.CacheRemoveWithTombstonesLoadTest2%,
id=94864], state=WAITING
Locked pages = []
Locked pages log:
name=sys-stripe-2-#74898%distributed.CacheRemoveWithTombstonesLoadTest2%
time=(1575365314249, 2019-12-03 12:28:34.249)
Thread=[name=sys-stripe-3-#74899%distributed.CacheRemoveWithTombstonesLoadTest2%,
id=94865], state=WAITING
Locked pages = []
Locked pages log:
name=sys-stripe-3-#74899%distributed.CacheRemoveWithTombstonesLoadTest2%
time=(1575365314249, 2019-12-03 12:28:34.249)
Thread=[name=sys-stripe-4-#74900%distributed.CacheRemoveWithTombstonesLoadTest2%,
id=94866], state=WAITING
Locked pages = []
Locked pages log:
name=sys-stripe-4-#74900%distributed.CacheRemoveWithTombstonesLoadTest2%
time=(1575365314249, 2019-12-03 12:28:34.249)
Thread=[name=sys-stripe-5-#74901%distributed.CacheRemoveWithTombstonesLoadTest2%,
id=94867], state=WAITING
Locked pages = []
Locked pages log:
name=sys-stripe-5-#74901%distributed.CacheRemoveWithTombstonesLoadTest2%
time=(1575365314249, 2019-12-03 12:28:34.249)
Thread=[name=sys-stripe-6-#74902%distributed.CacheRemoveWithTombstonesLoadTest2%,
id=94868], state=WAITING
Locked pages = []
Locked pages log:
name=sys-stripe-6-#74902%distributed.CacheRemoveWithTombstonesLoadTest2%
time=(1575365314249, 2019-12-03 12:28:34.249)
Thread=[name=sys-stripe-7-#74903%distributed.CacheRemoveWithTombstonesLoadTest2%,
id=94869], state=WAITING
Locked pages = []
Locked pages log:
name=sys-stripe-7-#74903%distributed.CacheRemoveWithTombstonesLoadTest2%
time=(1575365314249, 2019-12-03 12:28:34.249)
[2019-12-03
12:28:34,261][ERROR][sys-#74938%distributed.CacheRemoveWithTombstonesLoadTest2%][PartitionsEvictManager]
Partition eviction failed.
class org.apache.ignite.IgniteException: B+Tree is corrupted [pages(groupId,
pageId)=[IgniteBiTuple [val1=1544803905, val2=844420635196573]], msg=Runtime
failure on cursor iteration]
at
org.apache.ignite.internal.util.lang.GridIteratorAdapter.hasNext(GridIteratorAdapter.java:48)
at
org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtLocalPartition.doClear(GridDhtLocalPartition.java:1295)
at
org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtLocalPartition.clearTombstones(GridDhtLocalPartition.java:1242)
at
org.apache.ignite.internal.processors.cache.distributed.dht.topology.PartitionsEvictManager$ClearTombstonesTask.run0(PartitionsEvictManager.java:673)
at
org.apache.ignite.internal.processors.cache.distributed.dht.topology.PartitionsEvictManager$AbstractEvictionTask.run(PartitionsEvictManager.java:587)
at
org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:7061)
at
org.apache.ignite.internal.processors.closure.GridClosureProcessor$1.body(GridClosureProcessor.java:827)
at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: class
org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException:
B+Tree is corrupted [pages(groupId, pageId)=[IgniteBiTuple [val1=1544803905,
val2=844420635196573]], msg=Runtime failure on cursor iteration]
at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.corruptedTreeException(BPlusTree.java:5927)
at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$AbstractForwardCursor.nextPage(BPlusTree.java:5438)
at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$ForwardCursor.next(BPlusTree.java:5661)
at
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl$1.next(IgniteCacheOffheapManagerImpl.java:3020)
at
org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtLocalPartition$2.hasNextX(GridDhtLocalPartition.java:1226)
at
org.apache.ignite.internal.util.lang.GridIteratorAdapter.hasNext(GridIteratorAdapter.java:45)
... 10 more
Caused by: java.lang.AssertionError: Key is not ready: CacheDataRowAdapter
[key=null, val=null, expireTime=-1, ver=null, cacheId=0, link=0001003c000077d8]
at
org.apache.ignite.internal.processors.cache.persistence.CacheDataRowAdapter.key(CacheDataRowAdapter.java:837)
at
org.apache.ignite.internal.processors.cache.tree.CacheDataTree.compare(CacheDataTree.java:382)
at
org.apache.ignite.internal.processors.cache.tree.CacheDataTree.compare(CacheDataTree.java:63)
at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.compare(BPlusTree.java:5200)
at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$AbstractForwardCursor.findLowerBound(BPlusTree.java:5317)
at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$ForwardCursor.fillFromBuffer0(BPlusTree.java:5588)
at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$AbstractForwardCursor.fillFromBuffer(BPlusTree.java:5376)
at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$AbstractForwardCursor.nextPage(BPlusTree.java:5428)
... 14 more
[2019-12-03
12:28:34,268][ERROR][sys-#74938%distributed.CacheRemoveWithTombstonesLoadTest2%][IgniteTestResources]
Critical system error detected. Will be handled accordingly to configured
handler [hnd=NoOpFailureHandler [super=AbstractFailureHandler
[ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED,
SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext
[type=SYSTEM_WORKER_TERMINATION, err=class o.a.i.IgniteException: B+Tree is
corrupted [pages(groupId, pageId)=[IgniteBiTuple [val1=1544803905,
val2=844420635196573]], msg=Runtime failure on cursor iteration]]]
class org.apache.ignite.IgniteException: B+Tree is corrupted [pages(groupId,
pageId)=[IgniteBiTuple [val1=1544803905, val2=844420635196573]], msg=Runtime
failure on cursor iteration]
at
org.apache.ignite.internal.util.lang.GridIteratorAdapter.hasNext(GridIteratorAdapter.java:48)
at
org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtLocalPartition.doClear(GridDhtLocalPartition.java:1295)
at
org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtLocalPartition.clearTombstones(GridDhtLocalPartition.java:1242)
at
org.apache.ignite.internal.processors.cache.distributed.dht.topology.PartitionsEvictManager$ClearTombstonesTask.run0(PartitionsEvictManager.java:673)
at
org.apache.ignite.internal.processors.cache.distributed.dht.topology.PartitionsEvictManager$AbstractEvictionTask.run(PartitionsEvictManager.java:587)
at
org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:7061)
at
org.apache.ignite.internal.processors.closure.GridClosureProcessor$1.body(GridClosureProcessor.java:827)
at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: class
org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException:
B+Tree is corrupted [pages(groupId, pageId)=[IgniteBiTuple [val1=1544803905,
val2=844420635196573]], msg=Runtime failure on cursor iteration]
at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.corruptedTreeException(BPlusTree.java:5927)
at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$AbstractForwardCursor.nextPage(BPlusTree.java:5438)
at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$ForwardCursor.next(BPlusTree.java:5661)
at
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl$1.next(IgniteCacheOffheapManagerImpl.java:3020)
at
org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtLocalPartition$2.hasNextX(GridDhtLocalPartition.java:1226)
at
org.apache.ignite.internal.util.lang.GridIteratorAdapter.hasNext(GridIteratorAdapter.java:45)
... 10 more
Caused by: java.lang.AssertionError: Key is not ready: CacheDataRowAdapter
[key=null, val=null, expireTime=-1, ver=null, cacheId=0, link=0001003c000077d8]
at
org.apache.ignite.internal.processors.cache.persistence.CacheDataRowAdapter.key(CacheDataRowAdapter.java:837)
at
org.apache.ignite.internal.processors.cache.tree.CacheDataTree.compare(CacheDataTree.java:382)
at
org.apache.ignite.internal.processors.cache.tree.CacheDataTree.compare(CacheDataTree.java:63)
at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.compare(BPlusTree.java:5200)
at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$AbstractForwardCursor.findLowerBound(BPlusTree.java:5317)
at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$ForwardCursor.fillFromBuffer0(BPlusTree.java:5588)
at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$AbstractForwardCursor.fillFromBuffer(BPlusTree.java:5376)
at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$AbstractForwardCursor.nextPage(BPlusTree.java:5428)
... 14 more
[2019-12-03
12:28:34,275][ERROR][sys-#74938%distributed.CacheRemoveWithTombstonesLoadTest2%][FailureProcessor]
A critical problem with persistence data structures was detected. Please make
backup of persistence storage and WAL files for further analysis. Persistence
storage path: null WAL path: db/wal WAL archive path: db/wal/archive
[2019-12-03 12:28:34,277][WARN
][sys-#74938%distributed.CacheRemoveWithTombstonesLoadTest2%][CacheDiagnosticManager]
Page locks dump:
Thread=[name=rebalance-#74947%distributed.CacheRemoveWithTombstonesLoadTest2%,
id=94924], state=TIMED_WAITING
Locked pages = []
Locked pages log:
name=rebalance-#74947%distributed.CacheRemoveWithTombstonesLoadTest2%
time=(1575365314276, 2019-12-03 12:28:34.276)
Thread=[name=rebalance-#74949%distributed.CacheRemoveWithTombstonesLoadTest2%,
id=94926], state=TIMED_WAITING
Locked pages = []
Locked pages log:
name=rebalance-#74949%distributed.CacheRemoveWithTombstonesLoadTest2%
time=(1575365314276, 2019-12-03 12:28:34.276)
Thread=[name=rebalance-#74951%distributed.CacheRemoveWithTombstonesLoadTest2%,
id=94928], state=TIMED_WAITING
Locked pages = []
Locked pages log:
name=rebalance-#74951%distributed.CacheRemoveWithTombstonesLoadTest2%
time=(1575365314276, 2019-12-03 12:28:34.276)
Thread=[name=rebalance-#74952%distributed.CacheRemoveWithTombstonesLoadTest2%,
id=94929], state=TIMED_WAITING
Locked pages = []
Locked pages log:
name=rebalance-#74952%distributed.CacheRemoveWithTombstonesLoadTest2%
time=(1575365314276, 2019-12-03 12:28:34.276)
Thread=[name=sys-#74933%distributed.CacheRemoveWithTombstonesLoadTest2%,
id=94910], state=TIMED_WAITING
Locked pages = []
Locked pages log:
name=sys-#74933%distributed.CacheRemoveWithTombstonesLoadTest2%
time=(1575365314276, 2019-12-03 12:28:34.276)
Thread=[name=sys-#74934%distributed.CacheRemoveWithTombstonesLoadTest2%,
id=94911], state=TIMED_WAITING
Locked pages = []
Locked pages log:
name=sys-#74934%distributed.CacheRemoveWithTombstonesLoadTest2%
time=(1575365314276, 2019-12-03 12:28:34.276)
Thread=[name=sys-#74936%distributed.CacheRemoveWithTombstonesLoadTest2%,
id=94913], state=TIMED_WAITING
Locked pages = []
Locked pages log:
name=sys-#74936%distributed.CacheRemoveWithTombstonesLoadTest2%
time=(1575365314276, 2019-12-03 12:28:34.276)
Thread=[name=sys-#74937%distributed.CacheRemoveWithTombstonesLoadTest2%,
id=94914], state=TIMED_WAITING
Locked pages = []
Locked pages log:
name=sys-#74937%distributed.CacheRemoveWithTombstonesLoadTest2%
time=(1575365314276, 2019-12-03 12:28:34.276)
Thread=[name=sys-#74938%distributed.CacheRemoveWithTombstonesLoadTest2%,
id=94915], state=RUNNABLE
Locked pages = []
Locked pages log:
name=sys-#74938%distributed.CacheRemoveWithTombstonesLoadTest2%
time=(1575365314276, 2019-12-03 12:28:34.276)
Thread=[name=sys-#74939%distributed.CacheRemoveWithTombstonesLoadTest2%,
id=94916], state=TIMED_WAITING
Locked pages = []
Locked pages log:
name=sys-#74939%distributed.CacheRemoveWithTombstonesLoadTest2%
time=(1575365314276, 2019-12-03 12:28:34.276)
Thread=[name=sys-#74940%distributed.CacheRemoveWithTombstonesLoadTest2%,
id=94917], state=TIMED_WAITING
Locked pages = []
Locked pages log:
name=sys-#74940%distributed.CacheRemoveWithTombstonesLoadTest2%
time=(1575365314276, 2019-12-03 12:28:34.276)
Thread=[name=sys-#74941%distributed.CacheRemoveWithTombstonesLoadTest2%,
id=94918], state=TIMED_WAITING
Locked pages = []
Locked pages log:
name=sys-#74941%distributed.CacheRemoveWithTombstonesLoadTest2%
time=(1575365314276, 2019-12-03 12:28:34.276)
Thread=[name=sys-stripe-0-#74896%distributed.CacheRemoveWithTombstonesLoadTest2%,
id=94862], state=WAITING
Locked pages = []
Locked pages log:
name=sys-stripe-0-#74896%distributed.CacheRemoveWithTombstonesLoadTest2%
time=(1575365314276, 2019-12-03 12:28:34.276)
Thread=[name=sys-stripe-1-#74897%distributed.CacheRemoveWithTombstonesLoadTest2%,
id=94863], state=WAITING
Locked pages = []
Locked pages log:
name=sys-stripe-1-#74897%distributed.CacheRemoveWithTombstonesLoadTest2%
time=(1575365314276, 2019-12-03 12:28:34.276)
Thread=[name=sys-stripe-2-#74898%distributed.CacheRemoveWithTombstonesLoadTest2%,
id=94864], state=WAITING
Locked pages = []
Locked pages log:
name=sys-stripe-2-#74898%distributed.CacheRemoveWithTombstonesLoadTest2%
time=(1575365314276, 2019-12-03 12:28:34.276)
Thread=[name=sys-stripe-3-#74899%distributed.CacheRemoveWithTombstonesLoadTest2%,
id=94865], state=WAITING
Locked pages = []
Locked pages log:
name=sys-stripe-3-#74899%distributed.CacheRemoveWithTombstonesLoadTest2%
time=(1575365314276, 2019-12-03 12:28:34.276)
Thread=[name=sys-stripe-4-#74900%distributed.CacheRemoveWithTombstonesLoadTest2%,
id=94866], state=WAITING
Locked pages = []
Locked pages log:
name=sys-stripe-4-#74900%distributed.CacheRemoveWithTombstonesLoadTest2%
time=(1575365314276, 2019-12-03 12:28:34.276)
Thread=[name=sys-stripe-5-#74901%distributed.CacheRemoveWithTombstonesLoadTest2%,
id=94867], state=WAITING
Locked pages = []
Locked pages log:
name=sys-stripe-5-#74901%distributed.CacheRemoveWithTombstonesLoadTest2%
time=(1575365314276, 2019-12-03 12:28:34.276)
Thread=[name=sys-stripe-6-#74902%distributed.CacheRemoveWithTombstonesLoadTest2%,
id=94868], state=WAITING
Locked pages = []
Locked pages log:
name=sys-stripe-6-#74902%distributed.CacheRemoveWithTombstonesLoadTest2%
time=(1575365314276, 2019-12-03 12:28:34.276)
Thread=[name=sys-stripe-7-#74903%distributed.CacheRemoveWithTombstonesLoadTest2%,
id=94869], state=WAITING
Locked pages = []
Locked pages log:
name=sys-stripe-7-#74903%distributed.CacheRemoveWithTombstonesLoadTest2%
time=(1575365314276, 2019-12-03 12:28:34.276)
[2019-12-03 12:29:06,839][ERROR][main][root] Test failed.
java.lang.AssertionError: Failed to wait for tombstone cleanup:
distributed.CacheRemoveWithTombstonesLoadTest2 expected:<0> but was:<1>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:555)
at
org.apache.ignite.testframework.junits.JUnitAssertAware.assertEquals(JUnitAssertAware.java:114)
at
org.apache.ignite.internal.processors.cache.distributed.CacheRemoveWithTombstonesLoadTest.waitTombstoneCleanup(CacheRemoveWithTombstonesLoadTest.java:335)
at
org.apache.ignite.internal.processors.cache.distributed.CacheRemoveWithTombstonesLoadTest.removeAndRebalance(CacheRemoveWithTombstonesLoadTest.java:250)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at
org.apache.ignite.testframework.junits.GridAbstractTest$7.run(GridAbstractTest.java:2090)
{noformat}
> Write tombstones during rebalance to get rid of deferred delete buffer
> ----------------------------------------------------------------------
>
> Key: IGNITE-11704
> URL: https://issues.apache.org/jira/browse/IGNITE-11704
> Project: Ignite
> Issue Type: Improvement
> Reporter: Alexey Goncharuk
> Priority: Major
> Labels: rebalance
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> Currently Ignite relies on deferred delete buffer in order to handle
> write-remove conflicts during rebalance. Given the limit size of the buffer,
> this approach is fundamentally flawed, especially in case when persistence is
> enabled.
> I suggest to extend the logic of data storage to be able to store key
> tombstones - to keep version for deleted entries. The tombstones will be
> stored when rebalance is in progress and should be cleaned up when rebalance
> is completed.
> Later this approach may be used to implement fast partition rebalance based
> on merkle trees (in this case, tombstones should be written on an incomplete
> baseline).
--
This message was sent by Atlassian Jira
(v8.3.4#803005)