Hi devs,
I saw the following log messages in the cc.log of the Cloudberrry
cluster.
Yesterday around 9:50 pm, a task cluster was created and executed. And
around 9:51 pm, the cluster shows the following message. In the nc.log
of
the node 2, there were no noticeable log messages. I wonder what could
cause this situation.
-- cc.log at 9:51 pm
*21:51:54.203 [Worker:ClusterController] INFO
org.apache.hyracks.control.cc.cluster.NodeManager - Requesting node 2
to
shutdown to ensure failure*
*-- nc-2.log around 9:50 pm*
21:50:30.314 [Worker:2] INFO
org.apache.hyracks.control.nc.work.StartTasksWork - input: 0: CDID:2
21:50:30.375 [Worker:2] INFO
org.apache.hyracks.control.common.work.WorkQueue - Executing:
NotifyTaskCompleteWork:TAID:TID:ANID:ODID:7:0:3:0
21:50:30.382 [Worker:2] INFO
org.apache.hyracks.control.common.work.WorkQueue - Executing:
NotifyTaskCompleteWork:TAID:TID:ANID:ODID:9:0:3:0
21:50:30.390 [Worker:2] INFO
org.apache.hyracks.control.common.work.WorkQueue - Executing:
NotifyTaskCompleteWork:TAID:TID:ANID:ODID:9:0:2:0
21:50:30.391 [Worker:2] INFO
org.apache.hyracks.control.common.work.WorkQueue - Executing:
NotifyTaskCompleteWork:TAID:TID:ANID:ODID:7:0:2:0
21:50:30.392 [Worker:2] INFO
org.apache.hyracks.control.common.work.WorkQueue - Executing:
NotifyTaskCompleteWork:TAID:TID:ANID:ODID:1:1:3:0
21:50:30.392 [Worker:2] INFO
org.apache.hyracks.control.common.work.WorkQueue - Executing:
NotifyTaskCompleteWork:TAID:TID:ANID:ODID:1:1:2:0
21:50:30.398 [Worker:2] INFO
org.apache.hyracks.control.common.work.WorkQueue - Executing:
NotifyTaskCompleteWork:TAID:TID:ANID:ODID:1:1:3:0
21:50:30.398 [Worker:2] INFO
org.apache.hyracks.control.common.work.WorkQueue - Executing:
NotifyTaskCompleteWork:TAID:TID:ANID:ODID:1:1:2:0
22:02:30.290 [JVM exit thread] INFO org.apache.hyracks.util.ExitUtil
- JVM
exiting with status 0; bye!
-- *cc.log from 9:51 pm*
21:50:30.313 [Worker:ClusterController] INFO
org.apache.hyracks.control.cc.executor.ActivityClusterPlanner - Built
1
Task Clusters
21:50:30.313 [Worker:ClusterController] INFO
org.apache.hyracks.control.cc.executor.ActivityClusterPlanner -
Tasks:
[TID:ANID:ODID:1:1:0, TID:ANID:ODID:1:1:1, TID:ANID:ODID:1:1:2,
TID:ANID:ODID:1:1:3, TID:ANID:ODID:1:1:4, TID:ANID:ODID:1:1:5,
TID:ANID:ODID:1:1:6, TID:ANID:ODID:1:1:7, TID:ANID:ODID:1:1:8,
TID:ANID:ODID:1:1:9, TID:ANID:ODID:2:0:0, TID:ANID:ODID:2:0:1,
TID:ANID:ODID:2:0:2, TID:ANID:ODID:2:0:3, TID:ANID:ODID:2:0:4,
TID:ANID:ODID:2:0:5, TID:ANID:ODID:2:0:6, TID:ANID:ODID:2:0:7,
TID:ANID:ODID:2:0:8, TID:ANID:ODID:2:0:9, TID:ANID:ODID:9:0:0,
TID:ANID:ODID:9:0:1, TID:ANID:ODID:9:0:2, TID:ANID:ODID:9:0:3,
TID:ANID:ODID:9:0:4, TID:ANID:ODID:9:0:5, TID:ANID:ODID:9:0:6,
TID:ANID:ODID:9:0:7, TID:ANID:ODID:9:0:8, TID:ANID:ODID:9:0:9]
21:50:30.313 [Worker:ClusterController] INFO
org.apache.hyracks.control.cc.executor.JobExecutor - Runnable TC
roots:
[TC:[TID:ANID:ODID:1:1:0, TID:ANID:ODID:1:1:1, TID:ANID:ODID:1:1:2,
TID:ANID:ODID:1:1:3, TID:ANID:ODID:1:1:4, TID:ANID:ODID:1:1:5,
TID:ANID:ODID:1:1:6, TID:ANID:ODID:1:1:7, TID:ANID:ODID:1:1:8,
TID:ANID:ODID:1:1:9, TID:ANID:ODID:2:0:0, TID:ANID:ODID:2:0:1,
TID:ANID:ODID:2:0:2, TID:ANID:ODID:2:0:3, TID:ANID:ODID:2:0:4,
TID:ANID:ODID:2:0:5, TID:ANID:ODID:2:0:6, TID:ANID:ODID:2:0:7,
TID:ANID:ODID:2:0:8, TID:ANID:ODID:2:0:9, TID:ANID:ODID:9:0:0,
TID:ANID:ODID:9:0:1, TID:ANID:ODID:9:0:2, TID:ANID:ODID:9:0:3,
TID:ANID:ODID:9:0:4, TID:ANID:ODID:9:0:5, TID:ANID:ODID:9:0:6,
TID:ANID:ODID:9:0:7, TID:ANID:ODID:9:0:8, TID:ANID:ODID:9:0:9]],
inProgressTaskClusters: []
21:50:30.366 [Worker:ClusterController] INFO
org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [3[JID:0.700:TAID:TID:ANID:ODID:7:0:5:0]
21:50:30.374 [Worker:ClusterController] INFO
org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [5[JID:0.700:TAID:TID:ANID:ODID:7:0:9:0]
21:50:30.376 [Worker:ClusterController] INFO
org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [2[JID:0.700:TAID:TID:ANID:ODID:7:0:3:0]
21:50:30.376 [Worker:ClusterController] INFO
org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [5[JID:0.700:TAID:TID:ANID:ODID:7:0:8:0]
21:50:30.377 [Worker:ClusterController] INFO
org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [1[JID:0.700:TAID:TID:ANID:ODID:7:0:0:0]
21:50:30.378 [Worker:ClusterController] INFO
org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [3[JID:0.700:TAID:TID:ANID:ODID:7:0:4:0]
21:50:30.381 [Worker:ClusterController] INFO
org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [1[JID:0.700:TAID:TID:ANID:ODID:7:0:1:0]
21:50:30.382 [Worker:ClusterController] INFO
org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [5[JID:0.701:TAID:TID:ANID:ODID:9:0:8:0]
21:50:30.383 [Worker:ClusterController] INFO
org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [2[JID:0.701:TAID:TID:ANID:ODID:9:0:3:0]
21:50:30.384 [Worker:ClusterController] INFO
org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [3[JID:0.701:TAID:TID:ANID:ODID:9:0:4:0]
21:50:30.386 [Worker:ClusterController] INFO
org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [3[JID:0.701:TAID:TID:ANID:ODID:9:0:5:0]
21:50:30.386 [Worker:ClusterController] INFO
org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [5[JID:0.701:TAID:TID:ANID:ODID:9:0:9:0]
21:50:30.389 [Worker:ClusterController] INFO
org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [4[JID:0.701:TAID:TID:ANID:ODID:9:0:6:0]
21:50:30.389 [Worker:ClusterController] INFO
org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [1[JID:0.701:TAID:TID:ANID:ODID:9:0:0:0]
21:50:30.392 [Worker:ClusterController] INFO
org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [2[JID:0.701:TAID:TID:ANID:ODID:9:0:2:0]
21:50:30.392 [Worker:ClusterController] INFO
org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [4[JID:0.700:TAID:TID:ANID:ODID:7:0:7:0]
21:50:30.392 [Worker:ClusterController] INFO
org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [4[JID:0.700:TAID:TID:ANID:ODID:7:0:6:0]
21:50:30.394 [Worker:ClusterController] INFO
org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [1[JID:0.701:TAID:TID:ANID:ODID:9:0:1:0]
21:50:30.395 [Worker:ClusterController] INFO
org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [2[JID:0.700:TAID:TID:ANID:ODID:7:0:2:0]
21:50:30.395 [Worker:ClusterController] INFO
org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [4[JID:0.701:TAID:TID:ANID:ODID:9:0:7:0]
21:50:30.396 [Worker:ClusterController] INFO
org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [1[JID:0.700:TAID:TID:ANID:ODID:1:1:1:0]
21:50:30.396 [Worker:ClusterController] INFO
org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [1[JID:0.701:TAID:TID:ANID:ODID:1:1:0:0]
21:50:30.396 [Worker:ClusterController] INFO
org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [1[JID:0.700:TAID:TID:ANID:ODID:1:1:0:0]
21:50:30.396 [Worker:ClusterController] INFO
org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [1[JID:0.701:TAID:TID:ANID:ODID:1:1:1:0]
21:50:30.397 [Worker:ClusterController] INFO
org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [5[JID:0.700:TAID:TID:ANID:ODID:1:1:8:0]
21:50:30.398 [Worker:ClusterController] INFO
org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [3[JID:0.700:TAID:TID:ANID:ODID:1:1:4:0]
21:50:30.398 [Worker:ClusterController] INFO
org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [3[JID:0.700:TAID:TID:ANID:ODID:1:1:5:0]
21:50:30.398 [Worker:ClusterController] INFO
org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [3[JID:0.701:TAID:TID:ANID:ODID:1:1:4:0]
21:50:30.398 [Worker:ClusterController] INFO
org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [3[JID:0.701:TAID:TID:ANID:ODID:1:1:5:0]
21:50:30.399 [Worker:ClusterController] INFO
org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [4[JID:0.700:TAID:TID:ANID:ODID:1:1:6:0]
21:50:30.399 [Worker:ClusterController] INFO
org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [2[JID:0.700:TAID:TID:ANID:ODID:1:1:3:0]
21:50:30.399 [Worker:ClusterController] INFO
org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [2[JID:0.700:TAID:TID:ANID:ODID:1:1:2:0]
21:50:30.399 [Worker:ClusterController] INFO
org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [5[JID:0.700:TAID:TID:ANID:ODID:1:1:9:0]
21:50:30.399 [Worker:ClusterController] INFO
org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [5[JID:0.701:TAID:TID:ANID:ODID:1:1:8:0]
21:50:30.399 [Worker:ClusterController] INFO
org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [5[JID:0.701:TAID:TID:ANID:ODID:1:1:9:0]
21:50:30.402 [Worker:ClusterController] INFO
org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [4[JID:0.700:TAID:TID:ANID:ODID:1:1:7:0]
21:50:30.402 [Worker:ClusterController] INFO
org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [4[JID:0.701:TAID:TID:ANID:ODID:1:1:6:0]
21:50:30.402 [Worker:ClusterController] INFO
org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [4[JID:0.701:TAID:TID:ANID:ODID:1:1:7:0]
21:50:30.402 [Worker:ClusterController] INFO
org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [3[JID:0.700:TAID:TID:ANID:ODID:2:0:5:0]
21:50:30.402 [Worker:ClusterController] INFO
org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [3[JID:0.700:TAID:TID:ANID:ODID:2:0:4:0]
21:50:30.402 [Worker:ClusterController] INFO
org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [3[JID:0.700:TAID:TID:ANID:ODID:2:0:5:0]
21:50:30.402 [Worker:ClusterController] INFO
org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [3[JID:0.700:TAID:TID:ANID:ODID:2:0:4:0]
21:50:30.404 [Worker:ClusterController] INFO
org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [2[JID:0.701:TAID:TID:ANID:ODID:1:1:3:0]
21:50:30.404 [Worker:ClusterController] INFO
org.apache.hyracks.control.common.work.WorkQueue - Executing:
TaskComplete: [2[JID:0.701:TAID:TID:ANID:ODID:1:1:2:0]
*21:51:54.203 [Worker:ClusterController] INFO
org.apache.hyracks.control.cc.cluster.NodeManager - Requesting node 2
to
shutdown to ensure failure*
21:51:54.206 [Worker:ClusterController] INFO
org.apache.hyracks.control.cc.cluster.NodeManager - Request to
shutdown
failed node 2 succeeded. false positive heartbeat miss indication
21:51:54.206 [Worker:ClusterController] INFO
org.apache.hyracks.control.cc.cluster.NodeManager - 2 considered
dead.
Last heartbeat received 83799ms ago. Max miss period: 80000ms
21:51:54.206 [Worker:ClusterController] INFO
org.apache.hyracks.control.cc.cluster.NodeManager - Requesting node 5
to
shutdown to ensure failure
21:51:54.206 [Worker:ClusterController] INFO
org.apache.hyracks.control.cc.cluster.NodeManager - Request to
shutdown
failed node 5 succeeded. false positive heartbeat miss indication
21:51:54.206 [Worker:ClusterController] INFO
org.apache.hyracks.control.cc.cluster.NodeManager - 5 considered
dead.
Last heartbeat received 83806ms ago. Max miss period: 80000ms
21:51:54.207 [Worker:ClusterController] INFO
org.apache.hyracks.control.cc.cluster.NodeManager - Requesting node 3
to
shutdown to ensure failure
21:51:54.207 [Worker:ClusterController] INFO
org.apache.hyracks.control.cc.cluster.NodeManager - Request to
shutdown
failed node 3 succeeded. false positive heartbeat miss indication
21:51:54.207 [Worker:ClusterController] INFO
org.apache.hyracks.control.cc.cluster.NodeManager - 3 considered
dead.
Last heartbeat received 83804ms ago. Max miss period: 80000ms
21:51:54.207 [Worker:ClusterController] INFO
org.apache.hyracks.control.cc.cluster.NodeManager - Requesting node 4
to
shutdown to ensure failure
21:51:54.207 [Worker:ClusterController] INFO
org.apache.hyracks.control.cc.cluster.NodeManager - Request to
shutdown
failed node 4 succeeded. false positive heartbeat miss indication
21:51:54.207 [Worker:ClusterController] INFO
org.apache.hyracks.control.cc.cluster.NodeManager - 4 considered
dead.
Last heartbeat received 83804ms ago. Max miss period: 80000ms
21:51:54.207 [Worker:ClusterController] INFO
org.apache.hyracks.control.cc.work.RemoveDeadNodesWork - Number of
affected jobs: 3
21:51:54.216 [Worker:ClusterController] INFO
org.apache.hyracks.control.cc.executor.JobExecutor - Abort map for
job:
JID:0.1: {1=[TAID:TID:ANID:ODID:0:0:0:0, TAID:TID:ANID:ODID:3:0:0:0,
TAID:TID:ANID:ODID:3:0:1:0]}
21:51:54.216 [Worker:ClusterController] INFO
org.apache.hyracks.control.cc.executor.JobExecutor - Aborting:
[TAID:TID:ANID:ODID:0:0:0:0, TAID:TID:ANID:ODID:3:0:0:0,
TAID:TID:ANID:ODID:3:0:1:0] at 1
21:51:54.217 [Worker:ClusterController] INFO
org.apache.hyracks.control.cc.partitions.PartitionMatchMaker -
Removing
uncommitted partitions: []
21:51:54.217 [Worker:ClusterController] INFO
org.apache.hyracks.control.cc.partitions.PartitionMatchMaker -
Removing
partition requests: []
21:51:54.218 [Worker:ClusterController] INFO
org.apache.hyracks.control.cc.executor.JobExecutor - aborting doomed
task
clusters
21:51:54.218 [Worker:ClusterController] INFO
org.apache.hyracks.control.cc.executor.JobExecutor - number of doomed
task
clusters found = 0
21:51:54.218 [Worker:ClusterController] INFO
org.apache.hyracks.control.cc.executor.JobExecutor - Runnable TC
roots:
[TC:[TID:ANID:ODID:0:0:0, TID:ANID:ODID:3:0:0, TID:ANID:ODID:3:0:1,
TID:ANID:ODID:3:0:2, TID:ANID:ODID:3:0:3, TID:ANID:ODID:3:0:4,
TID:ANID:ODID:3:0:5, TID:ANID:ODID:3:0:6, TID:ANID:ODID:3:0:7,
TID:ANID:ODID:3:0:8, TID:ANID:ODID:3:0:9]], inProgressTaskClusters: []
21:51:54.218 [Worker:ClusterController] ERROR
org.apache.hyracks.control.cc.executor.JobExecutor - Unexpected
failure.
Aborting job JID:0.1
org.apache.hyracks.api.exceptions.HyracksException: Node 2 not live
at
org.apache.hyracks.control.cc.executor.JobExecutor.assignLocation(JobExecutor.java:473)
~[hyracks-control-cc-0.3.4-SNAPSHOT.jar:0.3.4-SNAPSHOT]
at
org.apache.hyracks.control.cc.executor.JobExecutor.assignTaskLocations(JobExecutor.java:365)
~[hyracks-control-cc-0.3.4-SNAPSHOT.jar:0.3.4-SNAPSHOT]
at
org.apache.hyracks.control.cc.executor.JobExecutor.startRunnableTaskClusters(JobExecutor.java:245)
~[hyracks-control-cc-0.3.4-SNAPSHOT.jar:0.3.4-SNAPSHOT]
at
org.apache.hyracks.control.cc.executor.JobExecutor.startRunnableActivityClusters(JobExecutor.java:209)
~[hyracks-control-cc-0.3.4-SNAPSHOT.jar:0.3.4-SNAPSHOT]
at
org.apache.hyracks.control.cc.executor.JobExecutor.notifyNodeFailures(JobExecutor.java:731)
[hyracks-control-cc-0.3.4-SNAPSHOT.jar:0.3.4-SNAPSHOT]
at
org.apache.hyracks.control.cc.work.RemoveDeadNodesWork.run(RemoveDeadNodesWork.java:60)
[hyracks-control-cc-0.3.4-SNAPSHOT.jar:0.3.4-SNAPSHOT]
at
org.apache.hyracks.control.common.work.WorkQueue$WorkerThread.run(WorkQueue.java:127)
[hyracks-control-common-0.3.4-SNAPSHOT.jar:0.3.4-SNAPSHOT]
21:51:54.218 [Worker:ClusterController] INFO
org.apache.hyracks.control.cc.executor.JobExecutor - Abort map for
job:
JID:0.700: {}
21:51:54.218 [Worker:ClusterController] INFO
org.apache.hyracks.control.cc.partitions.PartitionMatchMaker -
Removing
uncommitted partitions: []
21:51:54.218 [Worker:ClusterController] INFO
org.apache.hyracks.control.cc.partitions.PartitionMatchMaker -
Removing
partition requests: []
21:51:54.218 [Worker:ClusterController] INFO
org.apache.hyracks.control.cc.executor.JobExecutor - aborting doomed
task
clusters
21:51:54.219 [Worker:ClusterController] INFO
org.apache.hyracks.control.cc.executor.JobExecutor - number of doomed
task
clusters found = 0
21:51:54.219 [Worker:ClusterController] INFO
org.apache.hyracks.control.cc.executor.JobExecutor - Abort map for
job:
JID:0.700: {1=[TAID:TID:ANID:ODID:2:0:0:0,
TAID:TID:ANID:ODID:2:0:1:0]}
Best,
Taewoo