[ 
https://issues.apache.org/jira/browse/SPARK-11701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15002505#comment-15002505
 ] 

Thomas Graves commented on SPARK-11701:
---------------------------------------

I'm not exactly sure if this is same issue but trying this on a version of 1.6 
(not latest) after running a wordcount job it  we get a bunch of errors and it 
shuts down the SparkContext..

15/11/12 17:30:49 ERROR TransportChannelHandler: Connection to 
gsbl544n27.blue.ygrid.yahoo.com/10.213.42.242:33217 has been quiet for 120000 
ms while there are outstanding requests. Assuming connection is dead; please 
adjust spark.network.timeout if this is wrong.
15/11/12 17:30:49 ERROR TransportResponseHandler: Still have 15 requests 
outstanding when connection from 
gsbl544n27.blue.ygrid.yahoo.com/10.213.42.242:33217 is closed
15/11/12 17:30:49 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to 
get executor loss reason for executor id 2 at RPC address 
gsbl536n11.blue.ygrid.yahoo.com:47496, but got no response. Marking as slave 
lost.
java.io.IOException: Connection from 
gsbl544n27.blue.ygrid.yahoo.com/10.213.42.242:33217 closed
        at 
org.apache.spark.network.client.TransportResponseHandler.channelUnregistered(TransportResponseHandler.java:104)
        at 
org.apache.spark.network.server.TransportChannelHandler.channelUnregistered(TransportChannelHandler.java:91)
        at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:158)
        at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelUnregistered(AbstractChannelHandlerContext.java:144)
        at 
io.netty.channel.ChannelInboundHandlerAdapter.channelUnregistered(ChannelInboundHandlerAdapter.java:53)
        at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:158)
        at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelUnregistered(AbstractChannelHandlerContext.java:144)
        at 
io.netty.channel.ChannelInboundHandlerAdapter.channelUnregistered(ChannelInboundHandlerAdapter.java:53)
        at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:158)
        at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelUnregistered(AbstractChannelHandlerContext.java:144)
        at 
io.netty.channel.ChannelInboundHandlerAdapter.channelUnregistered(ChannelInboundHandlerAdapter.java:53)
        at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:158)
        at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelUnregistered(AbstractChannelHandlerContext.java:144)
        at 
io.netty.channel.DefaultChannelPipeline.fireChannelUnregistered(DefaultChannelPipeline.java:739)
        at 
io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:659)
        at 
io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
        at 
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
        at java.lang.Thread.run(Thread.java:745)


> YARN - dynamic allocation and speculation active task accounting wrong
> ----------------------------------------------------------------------
>
>                 Key: SPARK-11701
>                 URL: https://issues.apache.org/jira/browse/SPARK-11701
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.5.1
>            Reporter: Thomas Graves
>            Priority: Critical
>
> I am using dynamic container allocation and speculation and am seeing issues 
> with the active task accounting.  The Executor UI still shows active tasks on 
> the an executor but the job/stage is all completed.  I think its also 
> affecting the dynamic allocation being able to release containers because it 
> thinks there are still tasks.
> Its easily reproduce by using spark-shell, turn on dynamic allocation, then 
> run just a wordcount on decent sized file and set the speculation parameters 
> low: 
>  spark.dynamicAllocation.enabled true
>  spark.shuffle.service.enabled true
>  spark.dynamicAllocation.maxExecutors 10
>  spark.dynamicAllocation.minExecutors 2
>  spark.dynamicAllocation.initialExecutors 10
>  spark.dynamicAllocation.executorIdleTimeout 40s
> $SPARK_HOME/bin/spark-shell --conf spark.speculation=true --conf 
> spark.speculation.multiplier=0.2 --conf spark.speculation.quantile=0.1 
> --master yarn --deploy-mode client  --executor-memory 4g --driver-memory 4g



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to