[
https://issues.apache.org/jira/browse/ASTERIXDB-1535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15496479#comment-15496479
]
Jianfeng Jia commented on ASTERIXDB-1535:
-----------------------------------------
Here are some new updates.
The never-ending job id is JID:1125 that is an upserting job
The cc log shows that it haven't received the `Task Complete` msg from
partition 3.
{code}
Sep 15, 2016 5:07:17 PM
org.apache.hyracks.control.cc.scheduler.ActivityClusterPlanner
planActivityCluster
INFO: Tasks: [TID:ANID:ODID:1:1:0, TID:ANID:ODID:1:1:1, TID:ANID:ODID:1:1:2,
TID:ANID:ODID:1:1:3, TID:ANID:ODID:1:1:4, TID:ANID:ODID:1:1:5,
TID:ANID:ODID:1:1:6, TID:ANID:ODID:1:1:7, TID:ANID:ODID:1:1:8,
TID:ANID:ODID:1:1:9]
Sep 15, 2016 5:07:17 PM
org.apache.hyracks.control.common.work.WorkQueue$WorkerThread run
INFO: Executing: TaskComplete:
[cloudberry_pu[JID:1125:TAID:TID:ANID:ODID:1:1:6:0]
Sep 15, 2016 5:07:17 PM
org.apache.hyracks.control.common.work.WorkQueue$WorkerThread run
INFO: Executing: TaskComplete:
[cloudberry_np[JID:1125:TAID:TID:ANID:ODID:1:1:4:0]
Sep 15, 2016 5:07:17 PM
org.apache.hyracks.control.common.work.WorkQueue$WorkerThread run
INFO: Executing: TaskComplete:
[cloudberry_ac[JID:1125:TAID:TID:ANID:ODID:1:1:1:0]
Sep 15, 2016 5:07:17 PM
org.apache.hyracks.control.common.work.WorkQueue$WorkerThread run
INFO: Executing: TaskComplete:
[cloudberry_fr[JID:1125:TAID:TID:ANID:ODID:1:1:2:0]
Sep 15, 2016 5:07:17 PM
org.apache.hyracks.control.common.work.WorkQueue$WorkerThread run
INFO: Executing: TaskComplete:
[cloudberry_ac[JID:1125:TAID:TID:ANID:ODID:1:1:0:0]
Sep 15, 2016 5:07:17 PM
org.apache.hyracks.control.common.work.WorkQueue$WorkerThread run
INFO: Executing: TaskComplete:
[cloudberry_pu[JID:1125:TAID:TID:ANID:ODID:1:1:7:0]
Sep 15, 2016 5:07:17 PM
org.apache.hyracks.control.common.work.WorkQueue$WorkerThread run
INFO: Executing: TaskComplete:
[cloudberry_np[JID:1125:TAID:TID:ANID:ODID:1:1:5:0]
Sep 15, 2016 5:07:17 PM
org.apache.hyracks.control.common.work.WorkQueue$WorkerThread run
INFO: Executing: TaskComplete:
[cloudberry_th[JID:1125:TAID:TID:ANID:ODID:1:1:8:0]
Sep 15, 2016 5:07:17 PM
org.apache.hyracks.control.common.work.WorkQueue$WorkerThread run
INFO: Executing: TaskComplete:
[cloudberry_th[JID:1125:TAID:TID:ANID:ODID:1:1:9:0]
{code}
The corresponding nc log shows that it indeed didn't send the `Task Complete`
msg
{code}
INFO: Executing: StartTasks
Sep 15, 2016 5:07:17 PM org.apache.hyracks.control.nc.work.StartTasksWork run
INFO: Initializing TAID:TID:ANID:ODID:1:1:2:0 ->
[org.apache.hyracks.dataflow.std.sort.ExternalSortOperatorDescriptor$2@48103132,
org.apache.hyracks.dataflow.std.sort.ExternalSortOperatorDescriptor$2@7bd26066,
org.apache.hyracks.dataflow.std.intersect.IntersectOperatorDescriptor$IntersectActivity@740e36f9,
org.apache.hyracks.storage.am.btree.dataflow.BTreeSearchOperatorDescriptor@2964994,
org.apache.asterix.runtime.operators.AsterixLSMTreeUpsertOperatorDescriptor@4b467155,
Asterix {
stream-project [1];
assign [1] :=
[org.apache.asterix.runtime.evaluators.functions.records.FieldAccessByIndexEvalFactory$_EvaluatorFactoryGen@237ff90b];
stream-select
org.apache.asterix.runtime.evaluators.functions.AndDescriptor$2@26c3aa63;
stream-project [0];
assign [1] :=
[org.apache.asterix.runtime.evaluators.functions.records.FieldAccessByIndexEvalFactory$_EvaluatorFactoryGen@16a98dc6];
}, Asterix {
assign [3] :=
[org.apache.asterix.runtime.evaluators.functions.OrDescriptor$2@6e1d9a9f];
stream-project [3, 1];
commit;
}]
Sep 15, 2016 5:07:17 PM org.apache.hyracks.control.nc.work.StartTasksWork run
INFO: Initializing TAID:TID:ANID:ODID:1:1:3:0 ->
[org.apache.hyracks.dataflow.std.sort.ExternalSortOperatorDescriptor$2@48103132,
org.apache.hyracks.dataflow.std.sort.ExternalSortOperatorDescriptor$2@7bd26066,
org.apache.hyracks.dataflow.std.intersect.IntersectOperatorDescriptor$IntersectActivity@740e36f9,
org.apache.hyracks.storage.am.btree.dataflow.BTreeSearchOperatorDescriptor@2964994,
org.apache.asterix.runtime.operators.AsterixLSMTreeUpsertOperatorDescriptor@4b467155,
Asterix {
stream-project [1];
assign [1] :=
[org.apache.asterix.runtime.evaluators.functions.records.FieldAccessByIndexEvalFactory$_EvaluatorFactoryGen@237ff90b];
stream-select
org.apache.asterix.runtime.evaluators.functions.AndDescriptor$2@26c3aa63;
stream-project [0];
assign [1] :=
[org.apache.asterix.runtime.evaluators.functions.records.FieldAccessByIndexEvalFactory$_EvaluatorFactoryGen@16a98dc6];
}, Asterix {
assign [3] :=
[org.apache.asterix.runtime.evaluators.functions.OrDescriptor$2@6e1d9a9f];
stream-project [3, 1];
commit;
}]
Sep 15, 2016 5:07:17 PM
org.apache.hyracks.control.common.work.WorkQueue$WorkerThread run
INFO: Executing: NotifyTaskComplete
Sep 15, 2016 5:07:25 PM
org.apache.hyracks.control.common.dataset.ResultStateSweeper sweep
INFO: Result state cleanup instance successfully completed.
{code}
There supposed to be two `INFO: Executing: NotifyTaskComplete` but ended up
only one notified. There were no exceptions happening around that time.
> CC stop answering query from 19002 RESTAPI port
> -----------------------------------------------
>
> Key: ASTERIXDB-1535
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-1535
> Project: Apache AsterixDB
> Issue Type: Bug
> Components: HTTP API
> Environment: master
> commit a89fae64ac21fb8eefde79f79d2dbe1a0e54c364
> Date: Wed Jul 6 07:58:55 2016 -0700
> Reporter: Jianfeng Jia
> Assignee: Ian Maxon
> Attachments: cc.jstack
>
>
> The 8888/adminconsole showed that there are many pending jobs while the
> ingestion and the query works fine in nc.
> If this situation lasts longer enough, say 2 days, the 19002 API will stop
> response any queries, while the web interface from 19001 port can still
> answer the query.
> I need to restart the cluster to recover the service. Before that I record
> the jstack log of the cc as attached.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)