[ https://issues.apache.org/jira/browse/ASTERIXDB-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16401253#comment-16401253 ]
Taewoo Kim commented on ASTERIXDB-2185: --------------------------------------- [~mhubail] : thanks for the investigation. Unfortunately, there are no more CC and NC log records. The entire cluster was wiped out and rebuilt. If this happens again, I will attach CC and NC log records. Thanks. > Cluster becomes UNUSABLE status after a NC fails to send a job failure. > ----------------------------------------------------------------------- > > Key: ASTERIXDB-2185 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-2185 > Project: Apache AsterixDB > Issue Type: Bug > Components: IDX - Indexes, RT - Runtime > Reporter: Taewoo Kim > Assignee: Murtadha Hubail > Priority: Major > Labels: triaged > > A cluster became UNUSABLE status after a NC failed to send a job failure > message. See the exception below. > {code} > Dec 03, 2017 6:47:13 PM org.apache.hyracks.control.nc.work.StartTasksWork run > INFO: Initializing TAID:TID:ANID:ODID:16:0:1:0 -> [Asterix { > ets; > assign [0, 1, 2] := [Constant, Constant, Constant]; > }, > org.apache.hyracks.storage.am.lsm.invertedindex.dataflow.LSMInvertedIndexSearchOperatorDescriptor@23d902c1, > org.apache.hyracks.dataflow.std.sort.ExternalSort > OperatorDescriptor$1@2fc09944] > Dec 03, 2017 6:47:13 PM > org.apache.hyracks.dataflow.std.sort.AbstractSorterOperatorDescriptor$SortActivity$1 > close > INFO: InitialNumberOfRuns:0 > Dec 03, 2017 6:47:13 PM > org.apache.hyracks.control.common.work.WorkQueue$WorkerThread run > INFO: Executing: NotifyTaskCompleteWork:TAID:TID:ANID:ODID:13:0:1:0 > Dec 03, 2017 6:47:13 PM > org.apache.hyracks.control.common.work.WorkQueue$WorkerThread run > INFO: Executing: NotifyTaskCompleteWork:TAID:TID:ANID:ODID:13:0:0:0 > Dec 03, 2017 6:47:13 PM > org.apache.hyracks.dataflow.std.sort.AbstractSorterOperatorDescriptor$SortActivity$1 > close > INFO: InitialNumberOfRuns:0 > Dec 03, 2017 6:47:13 PM > org.apache.hyracks.control.common.work.WorkQueue$WorkerThread run > INFO: Executing: NotifyTaskCompleteWork:TAID:TID:ANID:ODID:16:0:0:0 > Dec 03, 2017 6:47:13 PM > org.apache.hyracks.control.common.work.WorkQueue$WorkerThread run > INFO: Executing: NotifyTaskCompleteWork:TAID:TID:ANID:ODID:16:0:1:0 > Dec 03, 2017 6:48:02 PM > org.apache.hyracks.control.common.work.WorkQueue$WorkerThread run > INFO: Executing: AbortTasks > Dec 03, 2017 6:48:02 PM org.apache.hyracks.control.nc.work.AbortTasksWork run > INFO: Aborting Tasks: JID:0:[TAID:TID:ANID:ODID:0:0:0:0, > TAID:TID:ANID:ODID:3:0:0:0, TAID:TID:ANID:ODID:3:0:1:0] > Dec 03, 2017 6:48:02 PM org.apache.hyracks.control.nc.Task run > WARNING: Task TAID:TID:ANID:ODID:3:0:0:0 failed with exception > java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1302) > at java.util.concurrent.Semaphore.acquire(Semaphore.java:467) > at org.apache.hyracks.control.nc.Task.run(Task.java:325) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:744) > Dec 03, 2017 6:48:02 PM org.apache.hyracks.control.nc.Task run > WARNING: Task TAID:TID:ANID:ODID:3:0:1:0 failed with exception > java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1302) > at java.util.concurrent.Semaphore.acquire(Semaphore.java:467) > at org.apache.hyracks.control.nc.Task.run(Task.java:325) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:744) > Dec 03, 2017 6:48:02 PM > org.apache.hyracks.control.common.work.WorkQueue$WorkerThread run > INFO: Executing: NotifyTaskFailure > Dec 03, 2017 6:48:02 PM > org.apache.hyracks.control.nc.work.NotifyTaskFailureWork run > WARNING: 1 is sending a notification to cc that task > TAID:TID:ANID:ODID:3:0:0:0 has failed > org.apache.hyracks.api.exceptions.HyracksDataException: HYR0003: > java.lang.InterruptedException > at > org.apache.hyracks.control.common.utils.ExceptionUtils.setNodeIds(ExceptionUtils.java:68) > at org.apache.hyracks.control.nc.Task.run(Task.java:367) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1302) > at java.util.concurrent.Semaphore.acquire(Semaphore.java:467) > at org.apache.hyracks.control.nc.Task.run(Task.java:325) > ... 3 more > > > ...... Same exception was repeated for several times ...... > Dec 03, 2017 6:48:02 PM > org.apache.hyracks.control.common.work.WorkQueue$WorkerThread run > INFO: Executing: NotifyTaskFailure > Dec 03, 2017 6:48:02 PM > org.apache.hyracks.control.nc.work.NotifyTaskFailureWork run > WARNING: 1 is sending a notification to cc that task > TAID:TID:ANID:ODID:3:0:0:0 has failed > org.apache.hyracks.api.exceptions.HyracksDataException: HYR0003: > java.lang.InterruptedException > at > org.apache.hyracks.control.common.utils.ExceptionUtils.setNodeIds(ExceptionUtils.java:68) > at org.apache.hyracks.control.nc.Task.run(Task.java:367) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1302) > at java.util.concurrent.Semaphore.acquire(Semaphore.java:467) > at org.apache.hyracks.control.nc.Task.run(Task.java:325) > ... 3 more > Dec 03, 2017 6:48:02 PM org.apache.hyracks.control.nc.Joblet close > WARNING: Freeing leaked 458752 bytes > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)