[ 
https://issues.apache.org/jira/browse/SPARK-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guoqiang Li updated SPARK-2681:
-------------------------------

    Attachment: jstack-26027.log

[~pwendell] Jstack output has been uploaded.

> Spark can hang when fetching shuffle blocks
> -------------------------------------------
>
>                 Key: SPARK-2681
>                 URL: https://issues.apache.org/jira/browse/SPARK-2681
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.0.1
>            Reporter: Guoqiang Li
>            Priority: Blocker
>         Attachments: jstack-26027.log
>
>
> executor log :
> {noformat}
> 14/07/24 22:56:52 INFO executor.CoarseGrainedExecutorBackend: Got assigned 
> task 53628
> 14/07/24 22:56:52 INFO executor.Executor: Running task ID 53628
> 14/07/24 22:56:52 INFO storage.BlockManager: Found block broadcast_3 locally
> 14/07/24 22:56:52 INFO storage.BlockManager: Found block broadcast_18 locally
> 14/07/24 22:56:52 INFO storage.BlockManager: Found block broadcast_16 locally
> 14/07/24 22:56:52 INFO storage.BlockManager: Found block broadcast_19 locally
> 14/07/24 22:56:52 INFO storage.BlockManager: Found block broadcast_20 locally
> 14/07/24 22:56:52 INFO storage.BlockManager: Found block broadcast_21 locally
> 14/07/24 22:56:52 INFO storage.BlockManager: Found block broadcast_22 locally
> 14/07/24 22:56:52 INFO storage.BlockManager: Found block broadcast_3 locally
> 14/07/24 22:56:52 INFO storage.BlockManager: Found block broadcast_18 locally
> 14/07/24 22:56:52 INFO storage.BlockManager: Found block broadcast_16 locally
> 14/07/24 22:56:52 INFO storage.BlockManager: Found block broadcast_19 locally
> 14/07/24 22:56:52 INFO storage.BlockManager: Found block broadcast_20 locally
> 14/07/24 22:56:52 INFO storage.BlockManager: Found block broadcast_21 locally
> 14/07/24 22:56:52 INFO storage.BlockManager: Found block broadcast_22 locally
> 14/07/24 22:56:52 INFO spark.MapOutputTrackerWorker: Updating epoch to 236 
> and clearing cache
> 14/07/24 22:56:52 INFO spark.CacheManager: Partition rdd_51_83 not found, 
> computing it
> 14/07/24 22:56:52 INFO spark.MapOutputTrackerWorker: Don't have map outputs 
> for shuffle 9, fetching them
> 14/07/24 22:56:52 INFO spark.MapOutputTrackerWorker: Doing the fetch; tracker 
> actor = 
> Actor[akka.tcp://spark@tuan202:49488/user/MapOutputTracker#-1031481395]
> 14/07/24 22:56:53 INFO spark.MapOutputTrackerWorker: Got the output locations
> 14/07/24 22:56:53 INFO 
> storage.BlockFetcherIterator$BasicBlockFetcherIterator: maxBytesInFlight: 
> 50331648, targetRequestSize: 10066329
> 14/07/24 22:56:53 INFO 
> storage.BlockFetcherIterator$BasicBlockFetcherIterator: Getting 1024 
> non-empty blocks out of 1024 blocks
> 14/07/24 22:56:53 INFO 
> storage.BlockFetcherIterator$BasicBlockFetcherIterator: Started 58 remote 
> fetches in 8 ms
> 14/07/24 22:56:55 INFO storage.MemoryStore: ensureFreeSpace(28728) called 
> with curMem=920109320, maxMem=4322230272
> 14/07/24 22:56:55 INFO storage.MemoryStore: Block rdd_51_83 stored as values 
> to memory (estimated size 28.1 KB, free 3.2 GB)
> 14/07/24 22:56:55 INFO storage.BlockManagerMaster: Updated info of block 
> rdd_51_83
> 14/07/24 22:56:55 INFO spark.CacheManager: Partition rdd_189_83 not found, 
> computing it
> 14/07/24 22:56:55 INFO spark.MapOutputTrackerWorker: Don't have map outputs 
> for shuffle 28, fetching them
> 14/07/24 22:56:55 INFO spark.MapOutputTrackerWorker: Doing the fetch; tracker 
> actor = 
> Actor[akka.tcp://spark@tuan202:49488/user/MapOutputTracker#-1031481395]
> 14/07/24 22:56:55 INFO spark.MapOutputTrackerWorker: Got the output locations
> 14/07/24 22:56:55 INFO 
> storage.BlockFetcherIterator$BasicBlockFetcherIterator: maxBytesInFlight: 
> 50331648, targetRequestSize: 10066329
> 14/07/24 22:56:55 INFO 
> storage.BlockFetcherIterator$BasicBlockFetcherIterator: Getting 1 non-empty 
> blocks out of 1024 blocks
> 14/07/24 22:56:55 INFO 
> storage.BlockFetcherIterator$BasicBlockFetcherIterator: Started 1 remote 
> fetches in 0 ms
> 14/07/24 22:56:55 INFO spark.CacheManager: Partition rdd_50_83 not found, 
> computing it
> 14/07/24 22:56:55 INFO 
> storage.BlockFetcherIterator$BasicBlockFetcherIterator: maxBytesInFlight: 
> 50331648, targetRequestSize: 10066329
> 14/07/24 22:56:55 INFO 
> storage.BlockFetcherIterator$BasicBlockFetcherIterator: Getting 1024 
> non-empty blocks out of 1024 blocks
> 14/07/24 22:56:55 INFO 
> storage.BlockFetcherIterator$BasicBlockFetcherIterator: Started 58 remote 
> fetches in 4 ms
> 14/07/24 22:57:09 INFO network.ConnectionManager: Removing 
> ReceivingConnection to ConnectionManagerId(tuan221,51153)
> 14/07/24 22:57:09 INFO network.ConnectionManager: Removing SendingConnection 
> to ConnectionManagerId(tuan221,51153)
> 14/07/24 22:57:09 INFO network.ConnectionManager: Removing SendingConnection 
> to ConnectionManagerId(tuan221,51153)
> 14/07/24 23:05:07 INFO network.ConnectionManager: Key not valid ? 
> sun.nio.ch.SelectionKeyImpl@3dcc1da1
> 14/07/24 23:05:07 INFO network.ConnectionManager: Removing SendingConnection 
> to ConnectionManagerId(tuan211,43828)
> 14/07/24 23:05:07 INFO network.ConnectionManager: key already cancelled ? 
> sun.nio.ch.SelectionKeyImpl@3dcc1da1
> java.nio.channels.CancelledKeyException
>       at 
> org.apache.spark.network.ConnectionManager.run(ConnectionManager.scala:363)
>       at 
> org.apache.spark.network.ConnectionManager$$anon$4.run(ConnectionManager.scala:116)
> 14/07/24 23:05:07 INFO network.ConnectionManager: Removing 
> ReceivingConnection to ConnectionManagerId(tuan211,43828)
> 14/07/24 23:05:07 ERROR network.ConnectionManager: Corresponding 
> SendingConnectionManagerId not found
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to