[ 
https://issues.apache.org/jira/browse/ASTERIXDB-1535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15496479#comment-15496479
 ] 

Jianfeng Jia commented on ASTERIXDB-1535:
-----------------------------------------

Here are some new updates.
The never-ending job id is JID:1125 that is an upserting job
The cc log shows that it haven't received the `Task Complete` msg from 
partition 3.
{code}
Sep 15, 2016 5:07:17 PM 
org.apache.hyracks.control.cc.scheduler.ActivityClusterPlanner 
planActivityCluster
INFO: Tasks: [TID:ANID:ODID:1:1:0, TID:ANID:ODID:1:1:1, TID:ANID:ODID:1:1:2, 
TID:ANID:ODID:1:1:3, TID:ANID:ODID:1:1:4, TID:ANID:ODID:1:1:5, 
TID:ANID:ODID:1:1:6, TID:ANID:ODID:1:1:7, TID:ANID:ODID:1:1:8, 
TID:ANID:ODID:1:1:9]
Sep 15, 2016 5:07:17 PM 
org.apache.hyracks.control.common.work.WorkQueue$WorkerThread run
INFO: Executing: TaskComplete: 
[cloudberry_pu[JID:1125:TAID:TID:ANID:ODID:1:1:6:0]
Sep 15, 2016 5:07:17 PM 
org.apache.hyracks.control.common.work.WorkQueue$WorkerThread run
INFO: Executing: TaskComplete: 
[cloudberry_np[JID:1125:TAID:TID:ANID:ODID:1:1:4:0]
Sep 15, 2016 5:07:17 PM 
org.apache.hyracks.control.common.work.WorkQueue$WorkerThread run
INFO: Executing: TaskComplete: 
[cloudberry_ac[JID:1125:TAID:TID:ANID:ODID:1:1:1:0]
Sep 15, 2016 5:07:17 PM 
org.apache.hyracks.control.common.work.WorkQueue$WorkerThread run               
                                                                        
INFO: Executing: TaskComplete: 
[cloudberry_fr[JID:1125:TAID:TID:ANID:ODID:1:1:2:0]
Sep 15, 2016 5:07:17 PM 
org.apache.hyracks.control.common.work.WorkQueue$WorkerThread run
INFO: Executing: TaskComplete: 
[cloudberry_ac[JID:1125:TAID:TID:ANID:ODID:1:1:0:0]
Sep 15, 2016 5:07:17 PM 
org.apache.hyracks.control.common.work.WorkQueue$WorkerThread run
INFO: Executing: TaskComplete: 
[cloudberry_pu[JID:1125:TAID:TID:ANID:ODID:1:1:7:0]
Sep 15, 2016 5:07:17 PM 
org.apache.hyracks.control.common.work.WorkQueue$WorkerThread run
INFO: Executing: TaskComplete: 
[cloudberry_np[JID:1125:TAID:TID:ANID:ODID:1:1:5:0]
Sep 15, 2016 5:07:17 PM 
org.apache.hyracks.control.common.work.WorkQueue$WorkerThread run
INFO: Executing: TaskComplete: 
[cloudberry_th[JID:1125:TAID:TID:ANID:ODID:1:1:8:0]
Sep 15, 2016 5:07:17 PM 
org.apache.hyracks.control.common.work.WorkQueue$WorkerThread run
INFO: Executing: TaskComplete: 
[cloudberry_th[JID:1125:TAID:TID:ANID:ODID:1:1:9:0]
{code}

The corresponding nc log shows that it indeed didn't send the `Task Complete` 
msg
{code}
INFO: Executing: StartTasks
Sep 15, 2016 5:07:17 PM org.apache.hyracks.control.nc.work.StartTasksWork run
INFO: Initializing TAID:TID:ANID:ODID:1:1:2:0 -> 
[org.apache.hyracks.dataflow.std.sort.ExternalSortOperatorDescriptor$2@48103132,
 
org.apache.hyracks.dataflow.std.sort.ExternalSortOperatorDescriptor$2@7bd26066, 
org.apache.hyracks.dataflow.std.intersect.IntersectOperatorDescriptor$IntersectActivity@740e36f9,
 
org.apache.hyracks.storage.am.btree.dataflow.BTreeSearchOperatorDescriptor@2964994,
 
org.apache.asterix.runtime.operators.AsterixLSMTreeUpsertOperatorDescriptor@4b467155,
 Asterix { 
  stream-project [1];
  assign [1] := 
[org.apache.asterix.runtime.evaluators.functions.records.FieldAccessByIndexEvalFactory$_EvaluatorFactoryGen@237ff90b];
  stream-select 
org.apache.asterix.runtime.evaluators.functions.AndDescriptor$2@26c3aa63;
  stream-project [0];
  assign [1] := 
[org.apache.asterix.runtime.evaluators.functions.records.FieldAccessByIndexEvalFactory$_EvaluatorFactoryGen@16a98dc6];
}, Asterix { 
  assign [3] := 
[org.apache.asterix.runtime.evaluators.functions.OrDescriptor$2@6e1d9a9f];
  stream-project [3, 1];
  commit;
}]
Sep 15, 2016 5:07:17 PM org.apache.hyracks.control.nc.work.StartTasksWork run
INFO: Initializing TAID:TID:ANID:ODID:1:1:3:0 -> 
[org.apache.hyracks.dataflow.std.sort.ExternalSortOperatorDescriptor$2@48103132,
 
org.apache.hyracks.dataflow.std.sort.ExternalSortOperatorDescriptor$2@7bd26066, 
org.apache.hyracks.dataflow.std.intersect.IntersectOperatorDescriptor$IntersectActivity@740e36f9,
 
org.apache.hyracks.storage.am.btree.dataflow.BTreeSearchOperatorDescriptor@2964994,
 
org.apache.asterix.runtime.operators.AsterixLSMTreeUpsertOperatorDescriptor@4b467155,
 Asterix { 
  stream-project [1];
  assign [1] := 
[org.apache.asterix.runtime.evaluators.functions.records.FieldAccessByIndexEvalFactory$_EvaluatorFactoryGen@237ff90b];
  stream-select 
org.apache.asterix.runtime.evaluators.functions.AndDescriptor$2@26c3aa63;
  stream-project [0];
  assign [1] := 
[org.apache.asterix.runtime.evaluators.functions.records.FieldAccessByIndexEvalFactory$_EvaluatorFactoryGen@16a98dc6];
}, Asterix { 
  assign [3] := 
[org.apache.asterix.runtime.evaluators.functions.OrDescriptor$2@6e1d9a9f];
  stream-project [3, 1];
  commit;
}]
Sep 15, 2016 5:07:17 PM 
org.apache.hyracks.control.common.work.WorkQueue$WorkerThread run
INFO: Executing: NotifyTaskComplete
Sep 15, 2016 5:07:25 PM 
org.apache.hyracks.control.common.dataset.ResultStateSweeper sweep
INFO: Result state cleanup instance successfully completed.
{code}

There supposed to be two `INFO: Executing: NotifyTaskComplete` but ended up 
only one notified. There were no exceptions happening around that time.

> CC stop answering query from 19002 RESTAPI port
> -----------------------------------------------
>
>                 Key: ASTERIXDB-1535
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-1535
>             Project: Apache AsterixDB
>          Issue Type: Bug
>          Components: HTTP API
>         Environment: master
> commit a89fae64ac21fb8eefde79f79d2dbe1a0e54c364
> Date:   Wed Jul 6 07:58:55 2016 -0700
>            Reporter: Jianfeng Jia
>            Assignee: Ian Maxon
>         Attachments: cc.jstack
>
>
> The 8888/adminconsole showed that there are many pending jobs while the 
> ingestion and the query works fine in nc. 
> If this situation lasts longer enough, say 2 days, the 19002 API will stop 
> response any queries, while the web interface from 19001 port can still 
> answer the query.
> I need to restart the cluster to recover the service. Before that I record 
> the jstack log of the cc as attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to