[ https://issues.apache.org/jira/browse/SPARK-13328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrew Or resolved SPARK-13328. ------------------------------- Resolution: Fixed Assignee: Nezih Yigitbasi Fix Version/s: 2.0.0 Target Version/s: 2.0.0 > Possible poor read performance for broadcast variables with dynamic resource > allocation > --------------------------------------------------------------------------------------- > > Key: SPARK-13328 > URL: https://issues.apache.org/jira/browse/SPARK-13328 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 1.5.2 > Reporter: Nezih Yigitbasi > Assignee: Nezih Yigitbasi > Fix For: 2.0.0 > > > When dynamic resource allocation is enabled fetching broadcast variables from > removed executors were causing job failures and SPARK-9591 fixed this problem > by trying all locations of a block before giving up. However, the locations > of a block is retrieved only once from the driver in this process and the > locations in this list can be stale due to dynamic resource allocation. This > situation gets worse when running on a large cluster as the size of this > location list can be in the order of several hundreds out of which there may > be tens of stale entries. What we have observed is with the default settings > of 3 max retries and 5s between retries (that's 15s per location) the time it > takes to read a broadcast variable can be as high as ~17m (below log shows > the failed 70th block fetch attempt where each attempt takes 15s) > {code} > ... > 16/02/13 01:02:27 WARN storage.BlockManager: Failed to fetch remote block > broadcast_18_piece0 from BlockManagerId(8, ip-10-178-77-38.ec2.internal, > 60675) (failed attempt 70) > ... > 16/02/13 01:02:27 INFO broadcast.TorrentBroadcast: Reading broadcast variable > 18 took 1051049 ms > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org