[
https://issues.apache.org/jira/browse/ASTERIXDB-1535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15420264#comment-15420264
]
Jianfeng Jia commented on ASTERIXDB-1535:
-----------------------------------------
It happens again. But this time I checked the log and saw some exceptions.
{code}
INFO: NO NEED TO NOTIFY JOB FINISH!
org.apache.hyracks.api.exceptions.HyracksException: Job failed on account of:
org.apache.hyracks.api.exceptions.HyracksDataException:
org.apache.hyracks.api.exceptions.HyracksDataException:
java.nio.channels.ClosedChannelException
at
org.apache.hyracks.control.cc.job.JobRun.waitForCompletion(JobRun.java:212)
at
org.apache.hyracks.control.cc.work.WaitForJobCompletionWork$1.run(WaitForJobCompletionWork.java:48)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:744)
Caused by: org.apache.hyracks.api.exceptions.HyracksDataException:
org.apache.hyracks.api.exceptions.HyracksDataException:
org.apache.hyracks.api.exceptions.HyracksDataException:
java.nio.channels.ClosedChannelException
at
org.apache.hyracks.control.common.utils.ExceptionUtils.setNodeIds(ExceptionUtils.java:45)
at org.apache.hyracks.control.nc.Task.run(Task.java:319)
... 3 more
Caused by: org.apache.hyracks.api.exceptions.HyracksDataException:
org.apache.hyracks.api.exceptions.HyracksDataException:
java.nio.channels.ClosedChannelException
at org.apache.hyracks.control.nc.Task.pushFrames(Task.java:365)
at org.apache.hyracks.control.nc.Task.run(Task.java:297)
... 3 more
Caused by: org.apache.hyracks.api.exceptions.HyracksDataException:
java.nio.channels.ClosedChannelException
at org.apache.hyracks.control.nc.io.IOManager.syncRead(IOManager.java:175)
at
org.apache.hyracks.storage.common.buffercache.BufferCache.read(BufferCache.java:575)
at
org.apache.hyracks.storage.common.buffercache.BufferCache.pin(BufferCache.java:211)
at
org.apache.hyracks.storage.am.common.freepage.LinkedMetaDataPageManager.getFirstMetadataPage(LinkedMetaDataPageManager.java:376)
at
org.apache.hyracks.storage.am.common.impls.AbstractTreeIndex.activate(AbstractTreeIndex.java:188)
at
org.apache.hyracks.storage.am.lsm.common.impls.AbstractLSMIndexFileManager.isValidTreeIndex(AbstractLSMIndexFileManager.java:83)
at
org.apache.hyracks.storage.am.lsm.common.impls.AbstractLSMIndexFileManager.cleanupAndGetValidFilesInternal(AbstractLSMIndexFileManager.java:114)
at
org.apache.hyracks.storage.am.lsm.btree.impls.LSMBTreeFileManager.cleanupAndGetValidFiles(LSMBTreeFileManager.java:95)
at
org.apache.hyracks.storage.am.lsm.btree.impls.LSMBTree.activate(LSMBTree.java:180)
at
org.apache.asterix.common.context.DatasetLifecycleManager.open(DatasetLifecycleManager.java:209)
at
org.apache.hyracks.storage.am.common.dataflow.IndexDataflowHelper.open(IndexDataflowHelper.java:116)
at
org.apache.asterix.runtime.operators.AsterixLSMPrimaryUpsertOperatorNodePushable.open(AsterixLSMPrimaryUpsertOperatorNodePushable.java:115)
at org.apache.hyracks.control.nc.Task.pushFrames(Task.java:341)
... 4 more
Caused by: java.nio.channels.ClosedChannelException
at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:94)
at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:673)
at org.apache.hyracks.control.nc.io.IOManager.syncRead(IOManager.java:163)
... 16 more
{code}
I'm thinking it's not a cc problem, it should still be a NC problem as
ASTERIXDB-1534
> CC stop answering query from 19002 RESTAPI port
> -----------------------------------------------
>
> Key: ASTERIXDB-1535
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-1535
> Project: Apache AsterixDB
> Issue Type: Bug
> Components: HTTP API
> Environment: master
> commit a89fae64ac21fb8eefde79f79d2dbe1a0e54c364
> Date: Wed Jul 6 07:58:55 2016 -0700
> Reporter: Jianfeng Jia
> Assignee: Ian Maxon
> Attachments: cc.jstack
>
>
> The 8888/adminconsole showed that there are many pending jobs while the
> ingestion and the query works fine in nc.
> If this situation lasts longer enough, say 2 days, the 19002 API will stop
> response any queries, while the web interface from 19001 port can still
> answer the query.
> I need to restart the cluster to recover the service. Before that I record
> the jstack log of the cc as attached.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)