OK. I got that part. There are five nodes (separate NUC machines) and the first node runs CC and NC.
Best, Taewoo On Sun, Sep 30, 2018 at 1:15 PM, Till Westmann <[email protected]> wrote: > Yes, and after 10 * 8000 ms = 80s have passed without a heartbeat the node > is considered dead. This seems to align with the log messages: > > 21:51:54.203 [Worker:ClusterController] INFO org.apache.hyracks.control.cc > .cluster.NodeManager - Requesting node 2 to shutdown to ensure failure > 21:51:54.206 [Worker:ClusterController] INFO org.apache.hyracks.control.cc > .cluster.NodeManager - Request to shutdown failed node 2 succeeded. false > positive heartbeat miss indication > 21:51:54.206 [Worker:ClusterController] INFO org.apache.hyracks.control.cc > .cluster.NodeManager - 2 considered dead. Last heartbeat received 83799ms > ago. Max miss period: 80000ms > > So that looks consistent. > The question seems to be (unless I missed something) why no heartbeats > were received for 80s. > > And it also looks like this happened for more than one node at the same > time. Are these nodes running on a single box or on multiple boxes? > > Cheers, > Till > > > On 30 Sep 2018, at 13:03, Taewoo Kim wrote: > > P.S. We set the heartbeat setting as follows. >> >> [cc] >> heartbeat.max.misses=10 >> heartbeat.period=8000 >> >> Best, >> Taewoo >> >> On Sun, Sep 30, 2018 at 12:50 PM, Taewoo Kim <[email protected]> wrote: >> >> Hi Till, >>> >>> These files are indeed the same log file. I just extracted one line from >>> the original log and placed the line at the beginning. At 9:50 pm, the CC >>> initiates a task cluster and all NCs were executing. Node 2 sent the >>> taskcomplete message, and somehow the CC decides to shut down the node 2. >>> That's my understanding. >>> >>> Best, >>> Taewoo >>> >>> On Sun, Sep 30, 2018 at 12:40 PM, Till Westmann <[email protected]> >>> wrote: >>> >>> Hi Taewoo, >>>> >>>> I’m little confused as we seem to have 2 cc logs. >>>> >>>> The "Requesting node 2 to shutdown …" message is logged, when nodes are >>>> considered to be dead as they have missed too many heartbeats. And you >>>> can >>>> see in the later logs that indeed more than 80s passed without >>>> receiving a >>>> heartbeat. >>>> Could it be that the machines were too busy to send/receive heartbeats? >>>> >>>> Cheers, >>>> Till >>>> >>>> On 30 Sep 2018, at 10:52, Taewoo Kim wrote: >>>> >>>> Hi devs, >>>> >>>>> >>>>> I saw the following log messages in the cc.log of the Cloudberrry >>>>> cluster. >>>>> Yesterday around 9:50 pm, a task cluster was created and executed. And >>>>> around 9:51 pm, the cluster shows the following message. In the nc.log >>>>> of >>>>> the node 2, there were no noticeable log messages. I wonder what could >>>>> cause this situation. >>>>> >>>>> -- cc.log at 9:51 pm >>>>> *21:51:54.203 [Worker:ClusterController] INFO >>>>> org.apache.hyracks.control.cc.cluster.NodeManager - Requesting node 2 >>>>> to >>>>> shutdown to ensure failure* >>>>> >>>>> *-- nc-2.log around 9:50 pm* >>>>> >>>>> 21:50:30.314 [Worker:2] INFO >>>>> org.apache.hyracks.control.nc.work.StartTasksWork - input: 0: CDID:2 >>>>> 21:50:30.375 [Worker:2] INFO >>>>> org.apache.hyracks.control.common.work.WorkQueue - Executing: >>>>> NotifyTaskCompleteWork:TAID:TID:ANID:ODID:7:0:3:0 >>>>> 21:50:30.382 [Worker:2] INFO >>>>> org.apache.hyracks.control.common.work.WorkQueue - Executing: >>>>> NotifyTaskCompleteWork:TAID:TID:ANID:ODID:9:0:3:0 >>>>> 21:50:30.390 [Worker:2] INFO >>>>> org.apache.hyracks.control.common.work.WorkQueue - Executing: >>>>> NotifyTaskCompleteWork:TAID:TID:ANID:ODID:9:0:2:0 >>>>> 21:50:30.391 [Worker:2] INFO >>>>> org.apache.hyracks.control.common.work.WorkQueue - Executing: >>>>> NotifyTaskCompleteWork:TAID:TID:ANID:ODID:7:0:2:0 >>>>> 21:50:30.392 [Worker:2] INFO >>>>> org.apache.hyracks.control.common.work.WorkQueue - Executing: >>>>> NotifyTaskCompleteWork:TAID:TID:ANID:ODID:1:1:3:0 >>>>> 21:50:30.392 [Worker:2] INFO >>>>> org.apache.hyracks.control.common.work.WorkQueue - Executing: >>>>> NotifyTaskCompleteWork:TAID:TID:ANID:ODID:1:1:2:0 >>>>> 21:50:30.398 [Worker:2] INFO >>>>> org.apache.hyracks.control.common.work.WorkQueue - Executing: >>>>> NotifyTaskCompleteWork:TAID:TID:ANID:ODID:1:1:3:0 >>>>> 21:50:30.398 [Worker:2] INFO >>>>> org.apache.hyracks.control.common.work.WorkQueue - Executing: >>>>> NotifyTaskCompleteWork:TAID:TID:ANID:ODID:1:1:2:0 >>>>> 22:02:30.290 [JVM exit thread] INFO org.apache.hyracks.util.ExitUtil >>>>> - >>>>> JVM >>>>> exiting with status 0; bye! >>>>> >>>>> >>>>> -- *cc.log from 9:51 pm* >>>>> >>>>> 21:50:30.313 [Worker:ClusterController] INFO >>>>> org.apache.hyracks.control.cc.executor.ActivityClusterPlanner - >>>>> Built 1 >>>>> Task Clusters >>>>> 21:50:30.313 [Worker:ClusterController] INFO >>>>> org.apache.hyracks.control.cc.executor.ActivityClusterPlanner - >>>>> Tasks: >>>>> [TID:ANID:ODID:1:1:0, TID:ANID:ODID:1:1:1, TID:ANID:ODID:1:1:2, >>>>> TID:ANID:ODID:1:1:3, TID:ANID:ODID:1:1:4, TID:ANID:ODID:1:1:5, >>>>> TID:ANID:ODID:1:1:6, TID:ANID:ODID:1:1:7, TID:ANID:ODID:1:1:8, >>>>> TID:ANID:ODID:1:1:9, TID:ANID:ODID:2:0:0, TID:ANID:ODID:2:0:1, >>>>> TID:ANID:ODID:2:0:2, TID:ANID:ODID:2:0:3, TID:ANID:ODID:2:0:4, >>>>> TID:ANID:ODID:2:0:5, TID:ANID:ODID:2:0:6, TID:ANID:ODID:2:0:7, >>>>> TID:ANID:ODID:2:0:8, TID:ANID:ODID:2:0:9, TID:ANID:ODID:9:0:0, >>>>> TID:ANID:ODID:9:0:1, TID:ANID:ODID:9:0:2, TID:ANID:ODID:9:0:3, >>>>> TID:ANID:ODID:9:0:4, TID:ANID:ODID:9:0:5, TID:ANID:ODID:9:0:6, >>>>> TID:ANID:ODID:9:0:7, TID:ANID:ODID:9:0:8, TID:ANID:ODID:9:0:9] >>>>> 21:50:30.313 [Worker:ClusterController] INFO >>>>> org.apache.hyracks.control.cc.executor.JobExecutor - Runnable TC >>>>> roots: >>>>> [TC:[TID:ANID:ODID:1:1:0, TID:ANID:ODID:1:1:1, TID:ANID:ODID:1:1:2, >>>>> TID:ANID:ODID:1:1:3, TID:ANID:ODID:1:1:4, TID:ANID:ODID:1:1:5, >>>>> TID:ANID:ODID:1:1:6, TID:ANID:ODID:1:1:7, TID:ANID:ODID:1:1:8, >>>>> TID:ANID:ODID:1:1:9, TID:ANID:ODID:2:0:0, TID:ANID:ODID:2:0:1, >>>>> TID:ANID:ODID:2:0:2, TID:ANID:ODID:2:0:3, TID:ANID:ODID:2:0:4, >>>>> TID:ANID:ODID:2:0:5, TID:ANID:ODID:2:0:6, TID:ANID:ODID:2:0:7, >>>>> TID:ANID:ODID:2:0:8, TID:ANID:ODID:2:0:9, TID:ANID:ODID:9:0:0, >>>>> TID:ANID:ODID:9:0:1, TID:ANID:ODID:9:0:2, TID:ANID:ODID:9:0:3, >>>>> TID:ANID:ODID:9:0:4, TID:ANID:ODID:9:0:5, TID:ANID:ODID:9:0:6, >>>>> TID:ANID:ODID:9:0:7, TID:ANID:ODID:9:0:8, TID:ANID:ODID:9:0:9]], >>>>> inProgressTaskClusters: [] >>>>> 21:50:30.366 [Worker:ClusterController] INFO >>>>> org.apache.hyracks.control.common.work.WorkQueue - Executing: >>>>> TaskComplete: [3[JID:0.700:TAID:TID:ANID:ODID:7:0:5:0] >>>>> 21:50:30.374 [Worker:ClusterController] INFO >>>>> org.apache.hyracks.control.common.work.WorkQueue - Executing: >>>>> TaskComplete: [5[JID:0.700:TAID:TID:ANID:ODID:7:0:9:0] >>>>> 21:50:30.376 [Worker:ClusterController] INFO >>>>> org.apache.hyracks.control.common.work.WorkQueue - Executing: >>>>> TaskComplete: [2[JID:0.700:TAID:TID:ANID:ODID:7:0:3:0] >>>>> 21:50:30.376 [Worker:ClusterController] INFO >>>>> org.apache.hyracks.control.common.work.WorkQueue - Executing: >>>>> TaskComplete: [5[JID:0.700:TAID:TID:ANID:ODID:7:0:8:0] >>>>> 21:50:30.377 [Worker:ClusterController] INFO >>>>> org.apache.hyracks.control.common.work.WorkQueue - Executing: >>>>> TaskComplete: [1[JID:0.700:TAID:TID:ANID:ODID:7:0:0:0] >>>>> 21:50:30.378 [Worker:ClusterController] INFO >>>>> org.apache.hyracks.control.common.work.WorkQueue - Executing: >>>>> TaskComplete: [3[JID:0.700:TAID:TID:ANID:ODID:7:0:4:0] >>>>> 21:50:30.381 [Worker:ClusterController] INFO >>>>> org.apache.hyracks.control.common.work.WorkQueue - Executing: >>>>> TaskComplete: [1[JID:0.700:TAID:TID:ANID:ODID:7:0:1:0] >>>>> 21:50:30.382 [Worker:ClusterController] INFO >>>>> org.apache.hyracks.control.common.work.WorkQueue - Executing: >>>>> TaskComplete: [5[JID:0.701:TAID:TID:ANID:ODID:9:0:8:0] >>>>> 21:50:30.383 [Worker:ClusterController] INFO >>>>> org.apache.hyracks.control.common.work.WorkQueue - Executing: >>>>> TaskComplete: [2[JID:0.701:TAID:TID:ANID:ODID:9:0:3:0] >>>>> 21:50:30.384 [Worker:ClusterController] INFO >>>>> org.apache.hyracks.control.common.work.WorkQueue - Executing: >>>>> TaskComplete: [3[JID:0.701:TAID:TID:ANID:ODID:9:0:4:0] >>>>> 21:50:30.386 [Worker:ClusterController] INFO >>>>> org.apache.hyracks.control.common.work.WorkQueue - Executing: >>>>> TaskComplete: [3[JID:0.701:TAID:TID:ANID:ODID:9:0:5:0] >>>>> 21:50:30.386 [Worker:ClusterController] INFO >>>>> org.apache.hyracks.control.common.work.WorkQueue - Executing: >>>>> TaskComplete: [5[JID:0.701:TAID:TID:ANID:ODID:9:0:9:0] >>>>> 21:50:30.389 [Worker:ClusterController] INFO >>>>> org.apache.hyracks.control.common.work.WorkQueue - Executing: >>>>> TaskComplete: [4[JID:0.701:TAID:TID:ANID:ODID:9:0:6:0] >>>>> 21:50:30.389 [Worker:ClusterController] INFO >>>>> org.apache.hyracks.control.common.work.WorkQueue - Executing: >>>>> TaskComplete: [1[JID:0.701:TAID:TID:ANID:ODID:9:0:0:0] >>>>> 21:50:30.392 [Worker:ClusterController] INFO >>>>> org.apache.hyracks.control.common.work.WorkQueue - Executing: >>>>> TaskComplete: [2[JID:0.701:TAID:TID:ANID:ODID:9:0:2:0] >>>>> 21:50:30.392 [Worker:ClusterController] INFO >>>>> org.apache.hyracks.control.common.work.WorkQueue - Executing: >>>>> TaskComplete: [4[JID:0.700:TAID:TID:ANID:ODID:7:0:7:0] >>>>> 21:50:30.392 [Worker:ClusterController] INFO >>>>> org.apache.hyracks.control.common.work.WorkQueue - Executing: >>>>> TaskComplete: [4[JID:0.700:TAID:TID:ANID:ODID:7:0:6:0] >>>>> 21:50:30.394 [Worker:ClusterController] INFO >>>>> org.apache.hyracks.control.common.work.WorkQueue - Executing: >>>>> TaskComplete: [1[JID:0.701:TAID:TID:ANID:ODID:9:0:1:0] >>>>> 21:50:30.395 [Worker:ClusterController] INFO >>>>> org.apache.hyracks.control.common.work.WorkQueue - Executing: >>>>> TaskComplete: [2[JID:0.700:TAID:TID:ANID:ODID:7:0:2:0] >>>>> 21:50:30.395 [Worker:ClusterController] INFO >>>>> org.apache.hyracks.control.common.work.WorkQueue - Executing: >>>>> TaskComplete: [4[JID:0.701:TAID:TID:ANID:ODID:9:0:7:0] >>>>> 21:50:30.396 [Worker:ClusterController] INFO >>>>> org.apache.hyracks.control.common.work.WorkQueue - Executing: >>>>> TaskComplete: [1[JID:0.700:TAID:TID:ANID:ODID:1:1:1:0] >>>>> 21:50:30.396 [Worker:ClusterController] INFO >>>>> org.apache.hyracks.control.common.work.WorkQueue - Executing: >>>>> TaskComplete: [1[JID:0.701:TAID:TID:ANID:ODID:1:1:0:0] >>>>> 21:50:30.396 [Worker:ClusterController] INFO >>>>> org.apache.hyracks.control.common.work.WorkQueue - Executing: >>>>> TaskComplete: [1[JID:0.700:TAID:TID:ANID:ODID:1:1:0:0] >>>>> 21:50:30.396 [Worker:ClusterController] INFO >>>>> org.apache.hyracks.control.common.work.WorkQueue - Executing: >>>>> TaskComplete: [1[JID:0.701:TAID:TID:ANID:ODID:1:1:1:0] >>>>> 21:50:30.397 [Worker:ClusterController] INFO >>>>> org.apache.hyracks.control.common.work.WorkQueue - Executing: >>>>> TaskComplete: [5[JID:0.700:TAID:TID:ANID:ODID:1:1:8:0] >>>>> 21:50:30.398 [Worker:ClusterController] INFO >>>>> org.apache.hyracks.control.common.work.WorkQueue - Executing: >>>>> TaskComplete: [3[JID:0.700:TAID:TID:ANID:ODID:1:1:4:0] >>>>> 21:50:30.398 [Worker:ClusterController] INFO >>>>> org.apache.hyracks.control.common.work.WorkQueue - Executing: >>>>> TaskComplete: [3[JID:0.700:TAID:TID:ANID:ODID:1:1:5:0] >>>>> 21:50:30.398 [Worker:ClusterController] INFO >>>>> org.apache.hyracks.control.common.work.WorkQueue - Executing: >>>>> TaskComplete: [3[JID:0.701:TAID:TID:ANID:ODID:1:1:4:0] >>>>> 21:50:30.398 [Worker:ClusterController] INFO >>>>> org.apache.hyracks.control.common.work.WorkQueue - Executing: >>>>> TaskComplete: [3[JID:0.701:TAID:TID:ANID:ODID:1:1:5:0] >>>>> 21:50:30.399 [Worker:ClusterController] INFO >>>>> org.apache.hyracks.control.common.work.WorkQueue - Executing: >>>>> TaskComplete: [4[JID:0.700:TAID:TID:ANID:ODID:1:1:6:0] >>>>> 21:50:30.399 [Worker:ClusterController] INFO >>>>> org.apache.hyracks.control.common.work.WorkQueue - Executing: >>>>> TaskComplete: [2[JID:0.700:TAID:TID:ANID:ODID:1:1:3:0] >>>>> 21:50:30.399 [Worker:ClusterController] INFO >>>>> org.apache.hyracks.control.common.work.WorkQueue - Executing: >>>>> TaskComplete: [2[JID:0.700:TAID:TID:ANID:ODID:1:1:2:0] >>>>> 21:50:30.399 [Worker:ClusterController] INFO >>>>> org.apache.hyracks.control.common.work.WorkQueue - Executing: >>>>> TaskComplete: [5[JID:0.700:TAID:TID:ANID:ODID:1:1:9:0] >>>>> 21:50:30.399 [Worker:ClusterController] INFO >>>>> org.apache.hyracks.control.common.work.WorkQueue - Executing: >>>>> TaskComplete: [5[JID:0.701:TAID:TID:ANID:ODID:1:1:8:0] >>>>> 21:50:30.399 [Worker:ClusterController] INFO >>>>> org.apache.hyracks.control.common.work.WorkQueue - Executing: >>>>> TaskComplete: [5[JID:0.701:TAID:TID:ANID:ODID:1:1:9:0] >>>>> 21:50:30.402 [Worker:ClusterController] INFO >>>>> org.apache.hyracks.control.common.work.WorkQueue - Executing: >>>>> TaskComplete: [4[JID:0.700:TAID:TID:ANID:ODID:1:1:7:0] >>>>> 21:50:30.402 [Worker:ClusterController] INFO >>>>> org.apache.hyracks.control.common.work.WorkQueue - Executing: >>>>> TaskComplete: [4[JID:0.701:TAID:TID:ANID:ODID:1:1:6:0] >>>>> 21:50:30.402 [Worker:ClusterController] INFO >>>>> org.apache.hyracks.control.common.work.WorkQueue - Executing: >>>>> TaskComplete: [4[JID:0.701:TAID:TID:ANID:ODID:1:1:7:0] >>>>> 21:50:30.402 [Worker:ClusterController] INFO >>>>> org.apache.hyracks.control.common.work.WorkQueue - Executing: >>>>> TaskComplete: [3[JID:0.700:TAID:TID:ANID:ODID:2:0:5:0] >>>>> 21:50:30.402 [Worker:ClusterController] INFO >>>>> org.apache.hyracks.control.common.work.WorkQueue - Executing: >>>>> TaskComplete: [3[JID:0.700:TAID:TID:ANID:ODID:2:0:4:0] >>>>> 21:50:30.402 [Worker:ClusterController] INFO >>>>> org.apache.hyracks.control.common.work.WorkQueue - Executing: >>>>> TaskComplete: [3[JID:0.700:TAID:TID:ANID:ODID:2:0:5:0] >>>>> 21:50:30.402 [Worker:ClusterController] INFO >>>>> org.apache.hyracks.control.common.work.WorkQueue - Executing: >>>>> TaskComplete: [3[JID:0.700:TAID:TID:ANID:ODID:2:0:4:0] >>>>> 21:50:30.404 [Worker:ClusterController] INFO >>>>> org.apache.hyracks.control.common.work.WorkQueue - Executing: >>>>> TaskComplete: [2[JID:0.701:TAID:TID:ANID:ODID:1:1:3:0] >>>>> 21:50:30.404 [Worker:ClusterController] INFO >>>>> org.apache.hyracks.control.common.work.WorkQueue - Executing: >>>>> TaskComplete: [2[JID:0.701:TAID:TID:ANID:ODID:1:1:2:0] >>>>> *21:51:54.203 [Worker:ClusterController] INFO >>>>> org.apache.hyracks.control.cc.cluster.NodeManager - Requesting node 2 >>>>> to >>>>> shutdown to ensure failure* >>>>> >>>>> 21:51:54.206 [Worker:ClusterController] INFO >>>>> org.apache.hyracks.control.cc.cluster.NodeManager - Request to >>>>> shutdown >>>>> failed node 2 succeeded. false positive heartbeat miss indication >>>>> 21:51:54.206 [Worker:ClusterController] INFO >>>>> org.apache.hyracks.control.cc.cluster.NodeManager - 2 considered >>>>> dead. >>>>> Last heartbeat received 83799ms ago. Max miss period: 80000ms >>>>> 21:51:54.206 [Worker:ClusterController] INFO >>>>> org.apache.hyracks.control.cc.cluster.NodeManager - Requesting node 5 >>>>> to >>>>> shutdown to ensure failure >>>>> 21:51:54.206 [Worker:ClusterController] INFO >>>>> org.apache.hyracks.control.cc.cluster.NodeManager - Request to >>>>> shutdown >>>>> failed node 5 succeeded. false positive heartbeat miss indication >>>>> 21:51:54.206 [Worker:ClusterController] INFO >>>>> org.apache.hyracks.control.cc.cluster.NodeManager - 5 considered >>>>> dead. >>>>> Last heartbeat received 83806ms ago. Max miss period: 80000ms >>>>> 21:51:54.207 [Worker:ClusterController] INFO >>>>> org.apache.hyracks.control.cc.cluster.NodeManager - Requesting node 3 >>>>> to >>>>> shutdown to ensure failure >>>>> 21:51:54.207 [Worker:ClusterController] INFO >>>>> org.apache.hyracks.control.cc.cluster.NodeManager - Request to >>>>> shutdown >>>>> failed node 3 succeeded. false positive heartbeat miss indication >>>>> 21:51:54.207 [Worker:ClusterController] INFO >>>>> org.apache.hyracks.control.cc.cluster.NodeManager - 3 considered >>>>> dead. >>>>> Last heartbeat received 83804ms ago. Max miss period: 80000ms >>>>> 21:51:54.207 [Worker:ClusterController] INFO >>>>> org.apache.hyracks.control.cc.cluster.NodeManager - Requesting node 4 >>>>> to >>>>> shutdown to ensure failure >>>>> 21:51:54.207 [Worker:ClusterController] INFO >>>>> org.apache.hyracks.control.cc.cluster.NodeManager - Request to >>>>> shutdown >>>>> failed node 4 succeeded. false positive heartbeat miss indication >>>>> 21:51:54.207 [Worker:ClusterController] INFO >>>>> org.apache.hyracks.control.cc.cluster.NodeManager - 4 considered >>>>> dead. >>>>> Last heartbeat received 83804ms ago. Max miss period: 80000ms >>>>> 21:51:54.207 [Worker:ClusterController] INFO >>>>> org.apache.hyracks.control.cc.work.RemoveDeadNodesWork - Number of >>>>> affected jobs: 3 >>>>> 21:51:54.216 [Worker:ClusterController] INFO >>>>> org.apache.hyracks.control.cc.executor.JobExecutor - Abort map for >>>>> job: >>>>> JID:0.1: {1=[TAID:TID:ANID:ODID:0:0:0:0, TAID:TID:ANID:ODID:3:0:0:0, >>>>> TAID:TID:ANID:ODID:3:0:1:0]} >>>>> 21:51:54.216 [Worker:ClusterController] INFO >>>>> org.apache.hyracks.control.cc.executor.JobExecutor - Aborting: >>>>> [TAID:TID:ANID:ODID:0:0:0:0, TAID:TID:ANID:ODID:3:0:0:0, >>>>> TAID:TID:ANID:ODID:3:0:1:0] at 1 >>>>> 21:51:54.217 [Worker:ClusterController] INFO >>>>> org.apache.hyracks.control.cc.partitions.PartitionMatchMaker - >>>>> Removing >>>>> uncommitted partitions: [] >>>>> 21:51:54.217 [Worker:ClusterController] INFO >>>>> org.apache.hyracks.control.cc.partitions.PartitionMatchMaker - >>>>> Removing >>>>> partition requests: [] >>>>> 21:51:54.218 [Worker:ClusterController] INFO >>>>> org.apache.hyracks.control.cc.executor.JobExecutor - aborting doomed >>>>> task >>>>> clusters >>>>> 21:51:54.218 [Worker:ClusterController] INFO >>>>> org.apache.hyracks.control.cc.executor.JobExecutor - number of doomed >>>>> task >>>>> clusters found = 0 >>>>> 21:51:54.218 [Worker:ClusterController] INFO >>>>> org.apache.hyracks.control.cc.executor.JobExecutor - Runnable TC >>>>> roots: >>>>> [TC:[TID:ANID:ODID:0:0:0, TID:ANID:ODID:3:0:0, TID:ANID:ODID:3:0:1, >>>>> TID:ANID:ODID:3:0:2, TID:ANID:ODID:3:0:3, TID:ANID:ODID:3:0:4, >>>>> TID:ANID:ODID:3:0:5, TID:ANID:ODID:3:0:6, TID:ANID:ODID:3:0:7, >>>>> TID:ANID:ODID:3:0:8, TID:ANID:ODID:3:0:9]], inProgressTaskClusters: [] >>>>> 21:51:54.218 [Worker:ClusterController] ERROR >>>>> org.apache.hyracks.control.cc.executor.JobExecutor - Unexpected >>>>> failure. >>>>> Aborting job JID:0.1 >>>>> org.apache.hyracks.api.exceptions.HyracksException: Node 2 not live >>>>> at >>>>> org.apache.hyracks.control.cc.executor.JobExecutor.assignLoc >>>>> ation(JobExecutor.java:473) >>>>> ~[hyracks-control-cc-0.3.4-SNAPSHOT.jar:0.3.4-SNAPSHOT] >>>>> at >>>>> org.apache.hyracks.control.cc.executor.JobExecutor.assignTas >>>>> kLocations(JobExecutor.java:365) >>>>> ~[hyracks-control-cc-0.3.4-SNAPSHOT.jar:0.3.4-SNAPSHOT] >>>>> at >>>>> org.apache.hyracks.control.cc.executor.JobExecutor.startRunn >>>>> ableTaskClusters(JobExecutor.java:245) >>>>> ~[hyracks-control-cc-0.3.4-SNAPSHOT.jar:0.3.4-SNAPSHOT] >>>>> at >>>>> org.apache.hyracks.control.cc.executor.JobExecutor.startRunn >>>>> ableActivityClusters(JobExecutor.java:209) >>>>> ~[hyracks-control-cc-0.3.4-SNAPSHOT.jar:0.3.4-SNAPSHOT] >>>>> at >>>>> org.apache.hyracks.control.cc.executor.JobExecutor.notifyNod >>>>> eFailures(JobExecutor.java:731) >>>>> [hyracks-control-cc-0.3.4-SNAPSHOT.jar:0.3.4-SNAPSHOT] >>>>> at >>>>> org.apache.hyracks.control.cc.work.RemoveDeadNodesWork.run(R >>>>> emoveDeadNodesWork.java:60) >>>>> [hyracks-control-cc-0.3.4-SNAPSHOT.jar:0.3.4-SNAPSHOT] >>>>> at >>>>> org.apache.hyracks.control.common.work.WorkQueue$WorkerThrea >>>>> d.run(WorkQueue.java:127) >>>>> [hyracks-control-common-0.3.4-SNAPSHOT.jar:0.3.4-SNAPSHOT] >>>>> 21:51:54.218 [Worker:ClusterController] INFO >>>>> org.apache.hyracks.control.cc.executor.JobExecutor - Abort map for >>>>> job: >>>>> JID:0.700: {} >>>>> 21:51:54.218 [Worker:ClusterController] INFO >>>>> org.apache.hyracks.control.cc.partitions.PartitionMatchMaker - >>>>> Removing >>>>> uncommitted partitions: [] >>>>> 21:51:54.218 [Worker:ClusterController] INFO >>>>> org.apache.hyracks.control.cc.partitions.PartitionMatchMaker - >>>>> Removing >>>>> partition requests: [] >>>>> 21:51:54.218 [Worker:ClusterController] INFO >>>>> org.apache.hyracks.control.cc.executor.JobExecutor - aborting doomed >>>>> task >>>>> clusters >>>>> 21:51:54.219 [Worker:ClusterController] INFO >>>>> org.apache.hyracks.control.cc.executor.JobExecutor - number of doomed >>>>> task >>>>> clusters found = 0 >>>>> 21:51:54.219 [Worker:ClusterController] INFO >>>>> org.apache.hyracks.control.cc.executor.JobExecutor - Abort map for >>>>> job: >>>>> JID:0.700: {1=[TAID:TID:ANID:ODID:2:0:0:0, >>>>> TAID:TID:ANID:ODID:2:0:1:0]} >>>>> >>>>> Best, >>>>> Taewoo >>>>> >>>>> >>>> >>>
