[
https://issues.apache.org/jira/browse/IGNITE-6587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16802823#comment-16802823
]
Ignite TC Bot commented on IGNITE-6587:
---------------------------------------
{panel:title=--> Run :: All: Possible
Blockers|borderStyle=dashed|borderColor=#ccc|titleBGColor=#F7D6C1}
{color:#d04437}Platform .NET (Core Linux){color} [[tests 0 Exit Code
|https://ci.ignite.apache.org/viewLog.html?buildId=3395254]]
{color:#d04437}ZooKeeper (Discovery) 1{color} [[tests 0 TIMEOUT , Exit Code
|https://ci.ignite.apache.org/viewLog.html?buildId=3395256]]
* ZookeeperDiscoverySpiTest.testDisconnectOnServersLeft_3 (last started)
{color:#d04437}Client Nodes{color} [[tests 0 TIMEOUT , Exit Code
|https://ci.ignite.apache.org/viewLog.html?buildId=3395258]]
* IgniteClientRejoinTest.testClientsReconnect (last started)
{color:#d04437}Cache 3{color} [[tests 0 TIMEOUT , Exit Code
|https://ci.ignite.apache.org/viewLog.html?buildId=3395266]]
* IgniteCacheGroupsTest.testRestartsAndCacheCreateDestroy (last started)
{color:#d04437}Platform C++ (Linux Clang){color} [[tests 0 Exit Code , Failure
on metric |https://ci.ignite.apache.org/viewLog.html?buildId=3395274]]
{color:#d04437}Hibernate 5.3{color} [[tests 0 Exit Code
|https://ci.ignite.apache.org/viewLog.html?buildId=3395282]]
{color:#d04437}Thin client: PHP{color} [[tests 0 Exit Code
|https://ci.ignite.apache.org/viewLog.html?buildId=3395280]]
{color:#d04437}Thin client: Node.js{color} [[tests 0 Exit Code
|https://ci.ignite.apache.org/viewLog.html?buildId=3395286]]
{color:#d04437}Thin client: Python{color} [[tests 0 Exit Code
|https://ci.ignite.apache.org/viewLog.html?buildId=3395290]]
{color:#d04437}Spring (Data){color} [[tests 0 Exit Code
|https://ci.ignite.apache.org/viewLog.html?buildId=3395294]]
{color:#d04437}Cache 1{color} [[tests
11|https://ci.ignite.apache.org/viewLog.html?buildId=3395262]]
* IgniteBinaryCacheTestSuite:
DataStreamerClientReconnectAfterClusterRestartTest.testTwoClientsAllowOverwrite
- 0,0% fails in last 405 master runs.
* IgniteBinaryCacheTestSuite:
DataStreamerClientReconnectAfterClusterRestartTest.testOneClientAllowOverwrite
- 0,0% fails in last 405 master runs.
* IgniteBinaryCacheTestSuite:
DataStreamerClientReconnectAfterClusterRestartTest.testTwoClients - 0,0% fails
in last 405 master runs.
* IgniteBinaryCacheTestSuite:
DataStreamerClientReconnectAfterClusterRestartTest.testOneClient - 0,0% fails
in last 405 master runs.
{color:#d04437}Queries 1{color} [[tests
6|https://ci.ignite.apache.org/viewLog.html?buildId=3395260]]
* IgniteBinaryCacheQueryTestSuite:
SchemaExchangeSelfTest.testServerRestartWithNewTypes - 0,0% fails in last 409
master runs.
{color:#d04437}PDS (Indexing){color} [[tests 4 Out Of Memory Error
|https://ci.ignite.apache.org/viewLog.html?buildId=3395264]]
* IgnitePdsWithIndexingCoreTestSuite:
IgniteLogicalRecoveryTest.testRecoveryOnJoinToDifferentBlt - 0,0% fails in last
398 master runs.
* IgnitePdsWithIndexingCoreTestSuite:
IgniteLogicalRecoveryTest.testRecoveryOnDynamicallyStartedCaches - 0,0% fails
in last 398 master runs.
* IgnitePdsWithIndexingCoreTestSuite:
IgnitePdsThreadInterruptionTest.testInterruptsOnWALWrite - 0,0% fails in last
398 master runs.
* IgniteLogicalRecoveryTest.testRecoveryOnDynamicallyStartedCaches (last
started)
{color:#d04437}Queries 2{color} [[tests
14|https://ci.ignite.apache.org/viewLog.html?buildId=3395268]]
* IgniteBinaryCacheQueryTestSuite2:
DynamicColumnsConcurrentTransactionalReplicatedSelfTest.testClientReconnectWithCacheRestart
- 0,0% fails in last 414 master runs.
* IgniteBinaryCacheQueryTestSuite2:
DynamicColumnsConcurrentAtomicPartitionedSelfTest.testClientReconnectWithCacheRestart
- 0,0% fails in last 414 master runs.
* IgniteBinaryCacheQueryTestSuite2:
DynamicIndexPartitionedTransactionalConcurrentSelfTest.testClientReconnectWithCacheRestart
- 0,0% fails in last 414 master runs.
* IgniteBinaryCacheQueryTestSuite2:
DynamicColumnsConcurrentTransactionalPartitionedSelfTest.testClientReconnectWithNonDynamicCacheRestart
- 0,0% fails in last 414 master runs.
* IgniteBinaryCacheQueryTestSuite2:
DynamicColumnsConcurrentTransactionalPartitionedSelfTest.testClientReconnectWithCacheRestart
- 0,0% fails in last 414 master runs.
* IgniteBinaryCacheQueryTestSuite2:
IgniteCacheQueryNodeRestartSelfTest2.testRestarts - 0,0% fails in last 0 master
runs.
* IgniteBinaryCacheQueryTestSuite2:
DynamicColumnsConcurrentAtomicReplicatedSelfTest.testClientReconnectWithNonDynamicCacheRestart
- 0,0% fails in last 414 master runs.
* IgniteBinaryCacheQueryTestSuite2:
DynamicIndexReplicatedAtomicConcurrentSelfTest.testClientReconnectWithCacheRestart
- 0,0% fails in last 414 master runs.
* IgniteBinaryCacheQueryTestSuite2:
DynamicColumnsConcurrentAtomicReplicatedSelfTest.testClientReconnectWithCacheRestart
- 0,0% fails in last 414 master runs.
* IgniteBinaryCacheQueryTestSuite2:
DynamicIndexPartitionedAtomicConcurrentSelfTest.testClientReconnectWithCacheRestart
- 0,0% fails in last 414 master runs.
* IgniteBinaryCacheQueryTestSuite2:
DynamicColumnsConcurrentTransactionalReplicatedSelfTest.testClientReconnectWithNonDynamicCacheRestart
- 0,0% fails in last 414 master runs.
* IgniteBinaryCacheQueryTestSuite2:
DynamicIndexReplicatedTransactionalConcurrentSelfTest.testClientReconnectWithCacheRestart
- 0,0% fails in last 414 master runs.
* IgniteBinaryCacheQueryTestSuite2:
DynamicColumnsConcurrentAtomicPartitionedSelfTest.testClientReconnectWithNonDynamicCacheRestart
- 0,0% fails in last 414 master runs.
{color:#d04437}ZooKeeper (Discovery) 2{color} [[tests
5|https://ci.ignite.apache.org/viewLog.html?buildId=3395270]]
* ZookeeperDiscoverySpiTestSuite2:
IgniteClientReconnectCacheTest.testReconnectClusterRestart - 0,0% fails in last
406 master runs.
* ZookeeperDiscoverySpiTestSuite2: IgniteClientDataStructuresTest.testSequence
* ZookeeperDiscoverySpiTestSuite2:
IgniteClientReconnectCacheTest.testReconnectCacheDestroyedAndCreated - 0,0%
fails in last 406 master runs.
* ZookeeperDiscoverySpiTestSuite2:
GridCacheReplicatedNodeRestartSelfTest.testRestartWithTxEightNodesTwoBackups
{color:#d04437}Cache 2{color} [[tests
3|https://ci.ignite.apache.org/viewLog.html?buildId=3395272]]
* IgniteCacheTestSuite2:
IgniteCacheClientNodeChangingTopologyTest.testPessimisticTx2 - 0,0% fails in
last 405 master runs.
* IgniteCacheTestSuite2:
IgniteCacheClientNodeChangingTopologyTest.testOptimisticTxPutAllMultinode -
0,0% fails in last 405 master runs.
* IgniteCacheTestSuite2:
IgniteClientCacheStartFailoverTest.testClientStartLastServerFailsTx - 0,0%
fails in last 405 master runs.
{color:#d04437}Continuous Query 1{color} [[tests
2|https://ci.ignite.apache.org/viewLog.html?buildId=3395278]]
* IgniteCacheQuerySelfTestSuite3:
CacheContinuousWithTransformerReplicatedSelfTest.testContinuousWithTransformerAndRegularListenerAsync
- 0,0% fails in last 413 master runs.
* IgniteCacheQuerySelfTestSuite3:
CacheContinuousQueryConcurrentPartitionUpdateTest.testConcurrentUpdatesAndQueryStartTx
- 0,0% fails in last 413 master runs.
{color:#d04437}Web Sessions{color} [[tests
4|https://ci.ignite.apache.org/viewLog.html?buildId=3395288]]
* IgniteWebSessionSelfTestSuite: WebSessionSelfTest.testClientReconnectRequest
- 0,0% fails in last 412 master runs.
{color:#d04437}Basic 3{color} [[tests
1|https://ci.ignite.apache.org/viewLog.html?buildId=3395292]]
* IgniteBasicWithPersistenceTestSuite:
PluginNodeValidationTest.testValidationException
{color:#d04437}Platform C++ (Win x64 | Release){color} [[tests 5 Failure on
metric , BuildFailureOnMessage
|https://ci.ignite.apache.org/viewLog.html?buildId=3395276]]
* IgniteOdbcTest: QueriesTestSuite: TestManyCursorsSelectMerge2 - 0,6% fails in
last 824 master runs.
* IgniteOdbcTest: QueriesTestSuite: TestManyCursorsTwoSelects2 - 0,6% fails in
last 824 master runs.
* IgniteOdbcTest: QueriesTestSuite: TestInsertBatchSelect2049 - 0,6% fails in
last 824 master runs.
* IgniteOdbcTest: QueriesTestSuite: TestInsertBatchSelect100 - 0,6% fails in
last 824 master runs.
* IgniteOdbcTest: QueriesTestSuite: TestNotFullInsertBatchSelect1500 - 0,6%
fails in last 824 master runs.
{panel}
[TeamCity *--> Run :: All*
Results|https://ci.ignite.apache.org/viewLog.html?buildId=3372451&buildTypeId=IgniteTests24Java8_RunAll]
> Ignite watchdog service
> -----------------------
>
> Key: IGNITE-6587
> URL: https://issues.apache.org/jira/browse/IGNITE-6587
> Project: Ignite
> Issue Type: Improvement
> Components: general
> Affects Versions: 2.2
> Reporter: Alexey Goncharuk
> Assignee: Andrey Kuznetsov
> Priority: Major
> Labels: IEP-5
> Fix For: 2.7
>
> Attachments: watchdog.sh
>
>
> As described in [1], each Ignite node has a number of system-critical
> threads. We should implement a periodic check that calls failure handler when
> one of the following conditions has been detected:
> * Critical thread is not alive anymore.
> * Critical thread 'hangs' for a long time, e.g. while executing a task
> extracted from task queue.
> In case of failure condition, call stacks of all threads should be logged
> before invoking failure handler.
> Actual list of system-critical threads can be found at [1].
> Implementations based on separate diagnostic thread seem fragile, cause this
> thread become a vulnerable point with respect to thread termination and CPU
> resource starvation. So we are to use self-monitoring approach: critical
> threads themselves should monitor each other.
> Currently we have {{o.a.i.internal.worker.WorkersRegistry}} facility that
> fits best to store and track system critical threads. All of them should be
> refactored to be {{GridWorker's}} and added to {{WorkersRegistry}}. Each
> worker should periodically choose some subset of peer workers and check
> whether
> * All of them are alive.
> * All of them are actively running.
> It's required to add a 'heartbeat' timestamp to worker in order to implement
> latter check. Additionally, infinite queue polls, waits on monitors or thread
> parks should be refactored to their timed equivalents in system critical
> threads.
> Monitoring parameters (enable/disable, check interval, thread 'hang'
> threshold, etc.) are to be set via system properties.
> [1]
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-14+Ignite+failures+handling
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)