[ 
https://issues.apache.org/jira/browse/SPARK-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14255008#comment-14255008
 ] 

Derrick Burns commented on SPARK-2681:
--------------------------------------

Appears to still happen in 1.1.1:

2014-12-20 22:54:00,574 INFO  [connection-manager-thread] 
network.ConnectionManager (Logging.scala:logInfo(59)) - Key not valid ? 
sun.nio.ch.SelectionKeyImpl@6045fa68
2014-12-20 22:54:00,574 INFO  [handle-read-write-executor-0] 
network.ConnectionManager (Logging.scala:logInfo(59)) - Removing 
SendingConnection to 
ConnectionManagerId(ip-10-89-134-186.us-west-2.compute.internal,49171)
2014-12-20 22:54:00,574 INFO  [handle-read-write-executor-2] 
network.ConnectionManager (Logging.scala:logInfo(59)) - Removing 
ReceivingConnection to 
ConnectionManagerId(ip-10-89-134-186.us-west-2.compute.internal,49171)
2014-12-20 22:54:00,575 INFO  [sparkDriver-akka.actor.default-dispatcher-14] 
cluster.YarnClientSchedulerBackend (Logging.scala:logInfo(59)) - Executor 7 
disconnected, so removing it
2014-12-20 22:54:00,576 ERROR [handle-read-write-executor-2] 
network.ConnectionManager (Logging.scala:logError(75)) - Corresponding 
SendingConnection to 
ConnectionManagerId(ip-10-89-134-186.us-west-2.compute.internal,49171) not found
2014-12-20 22:54:00,576 ERROR [sparkDriver-akka.actor.default-dispatcher-14] 
cluster.YarnClientClusterScheduler (Logging.scala:logError(75)) - Lost executor 
7 on ip-10-89-134-186.us-west-2.compute.internal: remote Akka client 
disassociated
2014-12-20 22:54:00,576 INFO  [connection-manager-thread] 
network.ConnectionManager (Logging.scala:logInfo(80)) - key already cancelled ? 
sun.nio.ch.SelectionKeyImpl@6045fa68
java.nio.channels.CancelledKeyException
        at 
org.apache.spark.network.ConnectionManager.run(ConnectionManager.scala:392)
        at 
org.apache.spark.network.ConnectionManager$$anon$4.run(ConnectionManager.scala:145)

> Spark can hang when fetching shuffle blocks
> -------------------------------------------
>
>                 Key: SPARK-2681
>                 URL: https://issues.apache.org/jira/browse/SPARK-2681
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.0.1
>            Reporter: Guoqiang Li
>            Priority: Blocker
>         Attachments: jstack-26027.log
>
>
> executor log :
> {noformat}
> 14/07/24 22:56:52 INFO executor.CoarseGrainedExecutorBackend: Got assigned 
> task 53628
> 14/07/24 22:56:52 INFO executor.Executor: Running task ID 53628
> 14/07/24 22:56:52 INFO storage.BlockManager: Found block broadcast_3 locally
> 14/07/24 22:56:52 INFO storage.BlockManager: Found block broadcast_18 locally
> 14/07/24 22:56:52 INFO storage.BlockManager: Found block broadcast_16 locally
> 14/07/24 22:56:52 INFO storage.BlockManager: Found block broadcast_19 locally
> 14/07/24 22:56:52 INFO storage.BlockManager: Found block broadcast_20 locally
> 14/07/24 22:56:52 INFO storage.BlockManager: Found block broadcast_21 locally
> 14/07/24 22:56:52 INFO storage.BlockManager: Found block broadcast_22 locally
> 14/07/24 22:56:52 INFO storage.BlockManager: Found block broadcast_3 locally
> 14/07/24 22:56:52 INFO storage.BlockManager: Found block broadcast_18 locally
> 14/07/24 22:56:52 INFO storage.BlockManager: Found block broadcast_16 locally
> 14/07/24 22:56:52 INFO storage.BlockManager: Found block broadcast_19 locally
> 14/07/24 22:56:52 INFO storage.BlockManager: Found block broadcast_20 locally
> 14/07/24 22:56:52 INFO storage.BlockManager: Found block broadcast_21 locally
> 14/07/24 22:56:52 INFO storage.BlockManager: Found block broadcast_22 locally
> 14/07/24 22:56:52 INFO spark.MapOutputTrackerWorker: Updating epoch to 236 
> and clearing cache
> 14/07/24 22:56:52 INFO spark.CacheManager: Partition rdd_51_83 not found, 
> computing it
> 14/07/24 22:56:52 INFO spark.MapOutputTrackerWorker: Don't have map outputs 
> for shuffle 9, fetching them
> 14/07/24 22:56:52 INFO spark.MapOutputTrackerWorker: Doing the fetch; tracker 
> actor = 
> Actor[akka.tcp://spark@tuan202:49488/user/MapOutputTracker#-1031481395]
> 14/07/24 22:56:53 INFO spark.MapOutputTrackerWorker: Got the output locations
> 14/07/24 22:56:53 INFO 
> storage.BlockFetcherIterator$BasicBlockFetcherIterator: maxBytesInFlight: 
> 50331648, targetRequestSize: 10066329
> 14/07/24 22:56:53 INFO 
> storage.BlockFetcherIterator$BasicBlockFetcherIterator: Getting 1024 
> non-empty blocks out of 1024 blocks
> 14/07/24 22:56:53 INFO 
> storage.BlockFetcherIterator$BasicBlockFetcherIterator: Started 58 remote 
> fetches in 8 ms
> 14/07/24 22:56:55 INFO storage.MemoryStore: ensureFreeSpace(28728) called 
> with curMem=920109320, maxMem=4322230272
> 14/07/24 22:56:55 INFO storage.MemoryStore: Block rdd_51_83 stored as values 
> to memory (estimated size 28.1 KB, free 3.2 GB)
> 14/07/24 22:56:55 INFO storage.BlockManagerMaster: Updated info of block 
> rdd_51_83
> 14/07/24 22:56:55 INFO spark.CacheManager: Partition rdd_189_83 not found, 
> computing it
> 14/07/24 22:56:55 INFO spark.MapOutputTrackerWorker: Don't have map outputs 
> for shuffle 28, fetching them
> 14/07/24 22:56:55 INFO spark.MapOutputTrackerWorker: Doing the fetch; tracker 
> actor = 
> Actor[akka.tcp://spark@tuan202:49488/user/MapOutputTracker#-1031481395]
> 14/07/24 22:56:55 INFO spark.MapOutputTrackerWorker: Got the output locations
> 14/07/24 22:56:55 INFO 
> storage.BlockFetcherIterator$BasicBlockFetcherIterator: maxBytesInFlight: 
> 50331648, targetRequestSize: 10066329
> 14/07/24 22:56:55 INFO 
> storage.BlockFetcherIterator$BasicBlockFetcherIterator: Getting 1 non-empty 
> blocks out of 1024 blocks
> 14/07/24 22:56:55 INFO 
> storage.BlockFetcherIterator$BasicBlockFetcherIterator: Started 1 remote 
> fetches in 0 ms
> 14/07/24 22:56:55 INFO spark.CacheManager: Partition rdd_50_83 not found, 
> computing it
> 14/07/24 22:56:55 INFO 
> storage.BlockFetcherIterator$BasicBlockFetcherIterator: maxBytesInFlight: 
> 50331648, targetRequestSize: 10066329
> 14/07/24 22:56:55 INFO 
> storage.BlockFetcherIterator$BasicBlockFetcherIterator: Getting 1024 
> non-empty blocks out of 1024 blocks
> 14/07/24 22:56:55 INFO 
> storage.BlockFetcherIterator$BasicBlockFetcherIterator: Started 58 remote 
> fetches in 4 ms
> 14/07/24 22:57:09 INFO network.ConnectionManager: Removing 
> ReceivingConnection to ConnectionManagerId(tuan221,51153)
> 14/07/24 22:57:09 INFO network.ConnectionManager: Removing SendingConnection 
> to ConnectionManagerId(tuan221,51153)
> 14/07/24 22:57:09 INFO network.ConnectionManager: Removing SendingConnection 
> to ConnectionManagerId(tuan221,51153)
> 14/07/24 23:05:07 INFO network.ConnectionManager: Key not valid ? 
> sun.nio.ch.SelectionKeyImpl@3dcc1da1
> 14/07/24 23:05:07 INFO network.ConnectionManager: Removing SendingConnection 
> to ConnectionManagerId(tuan211,43828)
> 14/07/24 23:05:07 INFO network.ConnectionManager: key already cancelled ? 
> sun.nio.ch.SelectionKeyImpl@3dcc1da1
> java.nio.channels.CancelledKeyException
>       at 
> org.apache.spark.network.ConnectionManager.run(ConnectionManager.scala:363)
>       at 
> org.apache.spark.network.ConnectionManager$$anon$4.run(ConnectionManager.scala:116)
> 14/07/24 23:05:07 INFO network.ConnectionManager: Removing 
> ReceivingConnection to ConnectionManagerId(tuan211,43828)
> 14/07/24 23:05:07 ERROR network.ConnectionManager: Corresponding 
> SendingConnectionManagerId not found
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to