[ https://issues.apache.org/jira/browse/ASTERIXDB-1535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15420264#comment-15420264 ]
Jianfeng Jia commented on ASTERIXDB-1535: ----------------------------------------- It happens again. But this time I checked the log and saw some exceptions. {code} INFO: NO NEED TO NOTIFY JOB FINISH! org.apache.hyracks.api.exceptions.HyracksException: Job failed on account of: org.apache.hyracks.api.exceptions.HyracksDataException: org.apache.hyracks.api.exceptions.HyracksDataException: java.nio.channels.ClosedChannelException at org.apache.hyracks.control.cc.job.JobRun.waitForCompletion(JobRun.java:212) at org.apache.hyracks.control.cc.work.WaitForJobCompletionWork$1.run(WaitForJobCompletionWork.java:48) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:744) Caused by: org.apache.hyracks.api.exceptions.HyracksDataException: org.apache.hyracks.api.exceptions.HyracksDataException: org.apache.hyracks.api.exceptions.HyracksDataException: java.nio.channels.ClosedChannelException at org.apache.hyracks.control.common.utils.ExceptionUtils.setNodeIds(ExceptionUtils.java:45) at org.apache.hyracks.control.nc.Task.run(Task.java:319) ... 3 more Caused by: org.apache.hyracks.api.exceptions.HyracksDataException: org.apache.hyracks.api.exceptions.HyracksDataException: java.nio.channels.ClosedChannelException at org.apache.hyracks.control.nc.Task.pushFrames(Task.java:365) at org.apache.hyracks.control.nc.Task.run(Task.java:297) ... 3 more Caused by: org.apache.hyracks.api.exceptions.HyracksDataException: java.nio.channels.ClosedChannelException at org.apache.hyracks.control.nc.io.IOManager.syncRead(IOManager.java:175) at org.apache.hyracks.storage.common.buffercache.BufferCache.read(BufferCache.java:575) at org.apache.hyracks.storage.common.buffercache.BufferCache.pin(BufferCache.java:211) at org.apache.hyracks.storage.am.common.freepage.LinkedMetaDataPageManager.getFirstMetadataPage(LinkedMetaDataPageManager.java:376) at org.apache.hyracks.storage.am.common.impls.AbstractTreeIndex.activate(AbstractTreeIndex.java:188) at org.apache.hyracks.storage.am.lsm.common.impls.AbstractLSMIndexFileManager.isValidTreeIndex(AbstractLSMIndexFileManager.java:83) at org.apache.hyracks.storage.am.lsm.common.impls.AbstractLSMIndexFileManager.cleanupAndGetValidFilesInternal(AbstractLSMIndexFileManager.java:114) at org.apache.hyracks.storage.am.lsm.btree.impls.LSMBTreeFileManager.cleanupAndGetValidFiles(LSMBTreeFileManager.java:95) at org.apache.hyracks.storage.am.lsm.btree.impls.LSMBTree.activate(LSMBTree.java:180) at org.apache.asterix.common.context.DatasetLifecycleManager.open(DatasetLifecycleManager.java:209) at org.apache.hyracks.storage.am.common.dataflow.IndexDataflowHelper.open(IndexDataflowHelper.java:116) at org.apache.asterix.runtime.operators.AsterixLSMPrimaryUpsertOperatorNodePushable.open(AsterixLSMPrimaryUpsertOperatorNodePushable.java:115) at org.apache.hyracks.control.nc.Task.pushFrames(Task.java:341) ... 4 more Caused by: java.nio.channels.ClosedChannelException at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:94) at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:673) at org.apache.hyracks.control.nc.io.IOManager.syncRead(IOManager.java:163) ... 16 more {code} I'm thinking it's not a cc problem, it should still be a NC problem as ASTERIXDB-1534 > CC stop answering query from 19002 RESTAPI port > ----------------------------------------------- > > Key: ASTERIXDB-1535 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-1535 > Project: Apache AsterixDB > Issue Type: Bug > Components: HTTP API > Environment: master > commit a89fae64ac21fb8eefde79f79d2dbe1a0e54c364 > Date: Wed Jul 6 07:58:55 2016 -0700 > Reporter: Jianfeng Jia > Assignee: Ian Maxon > Attachments: cc.jstack > > > The 8888/adminconsole showed that there are many pending jobs while the > ingestion and the query works fine in nc. > If this situation lasts longer enough, say 2 days, the 19002 API will stop > response any queries, while the web interface from 19001 port can still > answer the query. > I need to restart the cluster to recover the service. Before that I record > the jstack log of the cc as attached. -- This message was sent by Atlassian JIRA (v6.3.4#6332)