[ 
https://issues.apache.org/jira/browse/SPARK-14559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15403247#comment-15403247
 ] 

Tao Wang commented on SPARK-14559:
----------------------------------

Hi [~zsxwing], Sadly the application is ended now so i can't get the thread 
info :(

but i can be sure the AM is ok at the moment(even the 2 attempts both failed as 
too many executor failed).

another point is that after AM attempt 1 failed attempt 2 started at 11:30, but 
the RegisterClusterManager message is handled by driver at around  at 18:30. 

the dispatch thread which handle the RegisterClusterManager  message in thread 
pool(40 thread in total) is busy all the time while some other threads are idle.

So we doubt if some logic in dispatching message has some corner case for us to 
cover. 

This is all we can get from the log. If you need other information i will try 
to find them in logs which are we all have :(

> Netty RPC didn't check channel is active before sending message
> ---------------------------------------------------------------
>
>                 Key: SPARK-14559
>                 URL: https://issues.apache.org/jira/browse/SPARK-14559
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.6.0, 1.6.1
>         Environment: spark1.6.1 hadoop2.2.0 jdk1.8.0_65
>            Reporter: cen yuhai
>
> I have a long-running service. After running for serveral hours, It throwed 
> these exceptions. I  found that before sending rpc request by calling sendRpc 
> method in TransportClient, there is no check that whether the channel is 
> still open or active ?
> java.nio.channels.ClosedChannelException
>  4865 16/04/12 11:24:00 ERROR TransportClient: Failed to send RPC 
> 5635696155204230556 to 
> bigdata-arch-hdp407.bh.diditaxi.com/10.234.23.107:55197: java.nio.
>       channels.ClosedChannelException
>  4866 java.nio.channels.ClosedChannelException
>  4867 16/04/12 11:24:00 ERROR TransportClient: Failed to send RPC 
> 7319486003318455703 to 
> bigdata-arch-hdp1235.bh.diditaxi.com/10.168.145.239:36439: java.nio.
>       channels.ClosedChannelException
>  4868 java.nio.channels.ClosedChannelException
>  4869 16/04/12 11:24:00 ERROR TransportClient: Failed to send RPC 
> 9041854451893215954 to 
> bigdata-arch-hdp1398.bh.diditaxi.com/10.248.117.216:26801: java.nio.
>       channels.ClosedChannelException
>  4870 java.nio.channels.ClosedChannelException
>  4871 16/04/12 11:24:00 ERROR TransportClient: Failed to send RPC 
> 6046473497871624501 to 
> bigdata-arch-hdp948.bh.diditaxi.com/10.118.114.81:41903: java.nio.  
>       channels.ClosedChannelException
>  4872 java.nio.channels.ClosedChannelException
>  4873 16/04/12 11:24:00 ERROR TransportClient: Failed to send RPC 
> 9085605650438705047 to 
> bigdata-arch-hdp1126.bh.diditaxi.com/10.168.146.78:27023: java.nio.
>       channels.ClosedChannelException



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to