[
https://issues.apache.org/jira/browse/ASTERIXDB-2517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830654#comment-16830654
]
Wail Alkowaileet commented on ASTERIXDB-2517:
---------------------------------------------
I got the same issue as well. Except there is no ingestion in my case. I have
similar configuration (one cc and one nc) in the same machine.
{code:java}
12:45:14.422 [TCPEndpoint IO Thread [null]] ERROR
org.apache.hyracks.net.protocols.tcp.TCPEndpoint - Unexpected tcp io error in
connection TCPConnection[Remote Address: /127.0.0.1:36955 Local Address: null]
org.apache.hyracks.api.exceptions.NetException: Socket Closed
at
org.apache.hyracks.net.protocols.muxdemux.MultiplexedConnection.driveReaderStateMachine(MultiplexedConnection.java:342)
~[hyracks-net-0.3.5-SNAPSHOT.jar:0.3.5-SNAPSHOT]
at
org.apache.hyracks.net.protocols.muxdemux.MultiplexedConnection.notifyIOReady(MultiplexedConnection.java:113)
~[hyracks-net-0.3.5-SNAPSHOT.jar:0.3.5-SNAPSHOT]
at
org.apache.hyracks.net.protocols.tcp.TCPEndpoint$IOThread.run(TCPEndpoint.java:186)
[hyracks-net-0.3.5-SNAPSHOT.jar:0.3.5-SNAPSHOT]
12:46:04.244 [Executor-3:ClusterController] INFO
org.apache.hyracks.control.cc.cluster.NodeManager - Requesting node 1 to
shutdown to ensure failure
12:46:04.245 [Worker:ClusterController] INFO
org.apache.hyracks.control.cc.cluster.NodeManager - 1 considered dead. Last
heartbeat received 50558ms ago. Max miss period: 50000ms
12:46:04.245 [Worker:ClusterController] INFO
org.apache.hyracks.control.cc.work.RemoveDeadNodesWork - Number of affected
jobs: 1
12:46:04.249 [Executor-3:ClusterController] WARN
org.apache.hyracks.ipc.impl.ReconnectingIPCHandle - ipcHandle IPCHandle
[addr=/127.0.0.1:44551 state=CLOSED] disconnected; will attempt to reconnect 1
times
12:46:04.251 [IPC Network Listener Thread [/0:0:0:0:0:0:0:0:1099]] WARN
org.apache.hyracks.ipc.impl.IPCConnectionManager - Exception finishing connect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:1.8.0_201]
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
~[?:1.8.0_201]
at
org.apache.hyracks.ipc.impl.IPCConnectionManager$NetworkThread.finishConnect(IPCConnectionManager.java:239)
[hyracks-ipc-0.3.5-SNAPSHOT.jar:0.3.5-SNAPSHOT]
at
org.apache.hyracks.ipc.impl.IPCConnectionManager$NetworkThread.processSelectedKeys(IPCConnectionManager.java:229)
[hyracks-ipc-0.3.5-SNAPSHOT.jar:0.3.5-SNAPSHOT]
at
org.apache.hyracks.ipc.impl.IPCConnectionManager$NetworkThread.doRun(IPCConnectionManager.java:200)
[hyracks-ipc-0.3.5-SNAPSHOT.jar:0.3.5-SNAPSHOT]
at
org.apache.hyracks.ipc.impl.IPCConnectionManager$NetworkThread.run(IPCConnectionManager.java:181)
[hyracks-ipc-0.3.5-SNAPSHOT.jar:0.3.5-SNAPSHOT]
12:46:04.256 [IPC Network Listener Thread [/0:0:0:0:0:0:0:0:1099]] WARN
org.apache.hyracks.ipc.impl.IPCConnectionManager - Failed to finish connect to
/127.0.0.1:44551
12:46:04.257 [Executor-3:ClusterController] WARN
org.apache.hyracks.ipc.impl.IPCConnectionManager - Connection to
/127.0.0.1:44551 failed; retrying (retry attempt 1 of 1) after 100ms
12:46:04.265 [Worker:ClusterController] ERROR
org.apache.hyracks.control.cc.executor.JobExecutor - Unexpected failure.
Aborting job JID:0.13
org.apache.hyracks.api.exceptions.HyracksException: HYR0010: Node 1 does not
exist
at
org.apache.hyracks.api.exceptions.HyracksException.create(HyracksException.java:57)
~[hyracks-api-0.3.5-SNAPSHOT.jar:0.3.5-SNAPSHOT]
at
org.apache.hyracks.control.cc.executor.JobExecutor.assignLocation(JobExecutor.java:473)
~[hyracks-control-cc-0.3.5-SNAPSHOT.jar:0.3.5-SNAPSHOT]
at
org.apache.hyracks.control.cc.executor.JobExecutor.assignTaskLocations(JobExecutor.java:365)
~[hyracks-control-cc-0.3.5-SNAPSHOT.jar:0.3.5-SNAPSHOT]
at
org.apache.hyracks.control.cc.executor.JobExecutor.startRunnableTaskClusters(JobExecutor.java:245)
~[hyracks-control-cc-0.3.5-SNAPSHOT.jar:0.3.5-SNAPSHOT]
at
org.apache.hyracks.control.cc.executor.JobExecutor.startRunnableActivityClusters(JobExecutor.java:209)
~[hyracks-control-cc-0.3.5-SNAPSHOT.jar:0.3.5-SNAPSHOT]
at
org.apache.hyracks.control.cc.executor.JobExecutor.notifyNodeFailures(JobExecutor.java:732)
[hyracks-control-cc-0.3.5-SNAPSHOT.jar:0.3.5-SNAPSHOT]
at
org.apache.hyracks.control.cc.work.RemoveDeadNodesWork.run(RemoveDeadNodesWork.java:60)
[hyracks-control-cc-0.3.5-SNAPSHOT.jar:0.3.5-SNAPSHOT]
at
org.apache.hyracks.control.common.work.WorkQueue$WorkerThread.run(WorkQueue.java:127)
[hyracks-control-common-0.3.5-SNAPSHOT.jar:0.3.5-SNAPSHOT]
12:46:04.265 [Worker:ClusterController] INFO
org.apache.asterix.hyracks.bootstrap.ClusterLifecycleListener - NC: 1 left
12:46:04.265 [Worker:ClusterController] INFO
org.apache.asterix.runtime.utils.ClusterStateManager - Removing configuration
parameters for node id 1
12:46:04.265 [Worker:ClusterController] INFO
org.apache.asterix.runtime.utils.ClusterStateManager - updating cluster state
from ACTIVE to UNUSABLE
12:46:04.265 [Worker:ClusterController] INFO
org.apache.asterix.runtime.utils.ClusterStateManager - Cluster State is now
UNUSABLE
12:46:04.265 [Worker:ClusterController] INFO
org.apache.hyracks.control.cc.work.JobCleanupWork - Cleanup for job: JID:0.13
12:46:04.268 [IPC Network Listener Thread [/0:0:0:0:0:0:0:0:40433]] INFO
org.apache.hyracks.ipc.impl.IPCSystem - Exception in message
org.apache.hyracks.api.exceptions.HyracksException: HYR0010: Node 1 does not
exist
at
org.apache.hyracks.api.exceptions.HyracksException.create(HyracksException.java:57)
~[hyracks-api-0.3.5-SNAPSHOT.jar:0.3.5-SNAPSHOT]
at
org.apache.hyracks.control.cc.executor.JobExecutor.assignLocation(JobExecutor.java:473)
~[hyracks-control-cc-0.3.5-SNAPSHOT.jar:0.3.5-SNAPSHOT]
at
org.apache.hyracks.control.cc.executor.JobExecutor.assignTaskLocations(JobExecutor.java:365)
~[hyracks-control-cc-0.3.5-SNAPSHOT.jar:0.3.5-SNAPSHOT]
at
org.apache.hyracks.control.cc.executor.JobExecutor.startRunnableTaskClusters(JobExecutor.java:245)
~[hyracks-control-cc-0.3.5-SNAPSHOT.jar:0.3.5-SNAPSHOT]
at
org.apache.hyracks.control.cc.executor.JobExecutor.startRunnableActivityClusters(JobExecutor.java:209)
~[hyracks-control-cc-0.3.5-SNAPSHOT.jar:0.3.5-SNAPSHOT]
at
org.apache.hyracks.control.cc.executor.JobExecutor.notifyNodeFailures(JobExecutor.java:732)
~[hyracks-control-cc-0.3.5-SNAPSHOT.jar:0.3.5-SNAPSHOT]
at
org.apache.hyracks.control.cc.work.RemoveDeadNodesWork.run(RemoveDeadNodesWork.java:60)
~[hyracks-control-cc-0.3.5-SNAPSHOT.jar:0.3.5-SNAPSHOT]
at
org.apache.hyracks.control.common.work.WorkQueue$WorkerThread.run(WorkQueue.java:127)
~[hyracks-control-common-0.3.5-SNAPSHOT.jar:0.3.5-SNAPSHOT]
{code}
> Ingestion process failed on a cluster with two machines.
> --------------------------------------------------------
>
> Key: ASTERIXDB-2517
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-2517
> Project: Apache AsterixDB
> Issue Type: Bug
> Reporter: Taewoo Kim
> Priority: Major
> Attachments: cc.log, nc-1.log
>
>
> We have a cluster with two machines. Out of 1.5 billion records, about 1.2
> billion records were ingested using a socket adapter. However, the NC-1,
> which is located on the same machine, was shutdown. The time was around 19:21
> (please see the log records around that time). I have attached the log
> records.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)