[jira] [Commented] (ASTERIXDB-2517) Ingestion process failed on a cluster with two machines.

Wail Alkowaileet (JIRA) Tue, 30 Apr 2019 13:13:43 -0700


    [ 
https://issues.apache.org/jira/browse/ASTERIXDB-2517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830654#comment-16830654
 ]


Wail Alkowaileet commented on ASTERIXDB-2517:
---------------------------------------------

I got the same issue as well. Except there is no ingestion in my case. I have 
similar configuration (one cc and one nc) in the same machine.
{code:java}
12:45:14.422 [TCPEndpoint IO Thread [null]] ERROR 
org.apache.hyracks.net.protocols.tcp.TCPEndpoint - Unexpected tcp io error in 
connection TCPConnection[Remote Address: /127.0.0.1:36955 Local Address: null]
org.apache.hyracks.api.exceptions.NetException: Socket Closed
at 
org.apache.hyracks.net.protocols.muxdemux.MultiplexedConnection.driveReaderStateMachine(MultiplexedConnection.java:342)
 ~[hyracks-net-0.3.5-SNAPSHOT.jar:0.3.5-SNAPSHOT]
at 
org.apache.hyracks.net.protocols.muxdemux.MultiplexedConnection.notifyIOReady(MultiplexedConnection.java:113)
 ~[hyracks-net-0.3.5-SNAPSHOT.jar:0.3.5-SNAPSHOT]
at 
org.apache.hyracks.net.protocols.tcp.TCPEndpoint$IOThread.run(TCPEndpoint.java:186)
 [hyracks-net-0.3.5-SNAPSHOT.jar:0.3.5-SNAPSHOT]
12:46:04.244 [Executor-3:ClusterController] INFO 
org.apache.hyracks.control.cc.cluster.NodeManager - Requesting node 1 to 
shutdown to ensure failure
12:46:04.245 [Worker:ClusterController] INFO 
org.apache.hyracks.control.cc.cluster.NodeManager - 1 considered dead. Last 
heartbeat received 50558ms ago. Max miss period: 50000ms
12:46:04.245 [Worker:ClusterController] INFO 
org.apache.hyracks.control.cc.work.RemoveDeadNodesWork - Number of affected 
jobs: 1
12:46:04.249 [Executor-3:ClusterController] WARN 
org.apache.hyracks.ipc.impl.ReconnectingIPCHandle - ipcHandle IPCHandle 
[addr=/127.0.0.1:44551 state=CLOSED] disconnected; will attempt to reconnect 1 
times
12:46:04.251 [IPC Network Listener Thread [/0:0:0:0:0:0:0:0:1099]] WARN 
org.apache.hyracks.ipc.impl.IPCConnectionManager - Exception finishing connect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:1.8.0_201]
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) 
~[?:1.8.0_201]
at 
org.apache.hyracks.ipc.impl.IPCConnectionManager$NetworkThread.finishConnect(IPCConnectionManager.java:239)
 [hyracks-ipc-0.3.5-SNAPSHOT.jar:0.3.5-SNAPSHOT]
at 
org.apache.hyracks.ipc.impl.IPCConnectionManager$NetworkThread.processSelectedKeys(IPCConnectionManager.java:229)
 [hyracks-ipc-0.3.5-SNAPSHOT.jar:0.3.5-SNAPSHOT]
at 
org.apache.hyracks.ipc.impl.IPCConnectionManager$NetworkThread.doRun(IPCConnectionManager.java:200)
 [hyracks-ipc-0.3.5-SNAPSHOT.jar:0.3.5-SNAPSHOT]
at 
org.apache.hyracks.ipc.impl.IPCConnectionManager$NetworkThread.run(IPCConnectionManager.java:181)
 [hyracks-ipc-0.3.5-SNAPSHOT.jar:0.3.5-SNAPSHOT]
12:46:04.256 [IPC Network Listener Thread [/0:0:0:0:0:0:0:0:1099]] WARN 
org.apache.hyracks.ipc.impl.IPCConnectionManager - Failed to finish connect to 
/127.0.0.1:44551
12:46:04.257 [Executor-3:ClusterController] WARN 
org.apache.hyracks.ipc.impl.IPCConnectionManager - Connection to 
/127.0.0.1:44551 failed; retrying (retry attempt 1 of 1) after 100ms
12:46:04.265 [Worker:ClusterController] ERROR 
org.apache.hyracks.control.cc.executor.JobExecutor - Unexpected failure. 
Aborting job JID:0.13
org.apache.hyracks.api.exceptions.HyracksException: HYR0010: Node 1 does not 
exist
at 
org.apache.hyracks.api.exceptions.HyracksException.create(HyracksException.java:57)
 ~[hyracks-api-0.3.5-SNAPSHOT.jar:0.3.5-SNAPSHOT]
at 
org.apache.hyracks.control.cc.executor.JobExecutor.assignLocation(JobExecutor.java:473)
 ~[hyracks-control-cc-0.3.5-SNAPSHOT.jar:0.3.5-SNAPSHOT]
at 
org.apache.hyracks.control.cc.executor.JobExecutor.assignTaskLocations(JobExecutor.java:365)
 ~[hyracks-control-cc-0.3.5-SNAPSHOT.jar:0.3.5-SNAPSHOT]
at 
org.apache.hyracks.control.cc.executor.JobExecutor.startRunnableTaskClusters(JobExecutor.java:245)
 ~[hyracks-control-cc-0.3.5-SNAPSHOT.jar:0.3.5-SNAPSHOT]
at 
org.apache.hyracks.control.cc.executor.JobExecutor.startRunnableActivityClusters(JobExecutor.java:209)
 ~[hyracks-control-cc-0.3.5-SNAPSHOT.jar:0.3.5-SNAPSHOT]
at 
org.apache.hyracks.control.cc.executor.JobExecutor.notifyNodeFailures(JobExecutor.java:732)
 [hyracks-control-cc-0.3.5-SNAPSHOT.jar:0.3.5-SNAPSHOT]
at 
org.apache.hyracks.control.cc.work.RemoveDeadNodesWork.run(RemoveDeadNodesWork.java:60)
 [hyracks-control-cc-0.3.5-SNAPSHOT.jar:0.3.5-SNAPSHOT]
at 
org.apache.hyracks.control.common.work.WorkQueue$WorkerThread.run(WorkQueue.java:127)
 [hyracks-control-common-0.3.5-SNAPSHOT.jar:0.3.5-SNAPSHOT]
12:46:04.265 [Worker:ClusterController] INFO 
org.apache.asterix.hyracks.bootstrap.ClusterLifecycleListener - NC: 1 left
12:46:04.265 [Worker:ClusterController] INFO 
org.apache.asterix.runtime.utils.ClusterStateManager - Removing configuration 
parameters for node id 1
12:46:04.265 [Worker:ClusterController] INFO 
org.apache.asterix.runtime.utils.ClusterStateManager - updating cluster state 
from ACTIVE to UNUSABLE
12:46:04.265 [Worker:ClusterController] INFO 
org.apache.asterix.runtime.utils.ClusterStateManager - Cluster State is now 
UNUSABLE
12:46:04.265 [Worker:ClusterController] INFO 
org.apache.hyracks.control.cc.work.JobCleanupWork - Cleanup for job: JID:0.13
12:46:04.268 [IPC Network Listener Thread [/0:0:0:0:0:0:0:0:40433]] INFO 
org.apache.hyracks.ipc.impl.IPCSystem - Exception in message
org.apache.hyracks.api.exceptions.HyracksException: HYR0010: Node 1 does not 
exist
at 
org.apache.hyracks.api.exceptions.HyracksException.create(HyracksException.java:57)
 ~[hyracks-api-0.3.5-SNAPSHOT.jar:0.3.5-SNAPSHOT]
at 
org.apache.hyracks.control.cc.executor.JobExecutor.assignLocation(JobExecutor.java:473)
 ~[hyracks-control-cc-0.3.5-SNAPSHOT.jar:0.3.5-SNAPSHOT]
at 
org.apache.hyracks.control.cc.executor.JobExecutor.assignTaskLocations(JobExecutor.java:365)
 ~[hyracks-control-cc-0.3.5-SNAPSHOT.jar:0.3.5-SNAPSHOT]
at 
org.apache.hyracks.control.cc.executor.JobExecutor.startRunnableTaskClusters(JobExecutor.java:245)
 ~[hyracks-control-cc-0.3.5-SNAPSHOT.jar:0.3.5-SNAPSHOT]
at 
org.apache.hyracks.control.cc.executor.JobExecutor.startRunnableActivityClusters(JobExecutor.java:209)
 ~[hyracks-control-cc-0.3.5-SNAPSHOT.jar:0.3.5-SNAPSHOT]
at 
org.apache.hyracks.control.cc.executor.JobExecutor.notifyNodeFailures(JobExecutor.java:732)
 ~[hyracks-control-cc-0.3.5-SNAPSHOT.jar:0.3.5-SNAPSHOT]
at 
org.apache.hyracks.control.cc.work.RemoveDeadNodesWork.run(RemoveDeadNodesWork.java:60)
 ~[hyracks-control-cc-0.3.5-SNAPSHOT.jar:0.3.5-SNAPSHOT]
at 
org.apache.hyracks.control.common.work.WorkQueue$WorkerThread.run(WorkQueue.java:127)
 ~[hyracks-control-common-0.3.5-SNAPSHOT.jar:0.3.5-SNAPSHOT]
{code}

> Ingestion process failed on a cluster with two machines.
> --------------------------------------------------------
>
>                 Key: ASTERIXDB-2517
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2517
>             Project: Apache AsterixDB
>          Issue Type: Bug
>            Reporter: Taewoo Kim
>            Priority: Major
>         Attachments: cc.log, nc-1.log
>
>
> We have a cluster with two machines. Out of 1.5 billion records, about 1.2 
> billion records were ingested using a socket adapter. However, the NC-1, 
> which is located on the same machine, was shutdown. The time was around 19:21 
> (please see the log records around that time). I have attached the log 
> records.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ASTERIXDB-2517) Ingestion process failed on a cluster with two machines.

Reply via email to