[
https://issues.apache.org/jira/browse/SPARK-3119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Reynold Xin updated SPARK-3119:
-------------------------------
Description:
TorrentBroadcast is unnecessarily complicated:
1. It tracks a lot of mutable states, such as total number of bytes, number of
blocks fetched.
2. It has at least two data structures that are not needed: TorrentInfo and
TorrentBlock.
3. It uses getSingle on executors to get the block instead of getLocal,
resulting in an extra roundtrip to look up the location of the block when the
block doesn't exist yet.
4. It has a metadata block that is completely unnecessary.
5. It does an extra memory copy during deserialization to copy all the blocks
into a single giant array.
> Re-implement TorrentBroadcast
> -----------------------------
>
> Key: SPARK-3119
> URL: https://issues.apache.org/jira/browse/SPARK-3119
> Project: Spark
> Issue Type: Improvement
> Reporter: Reynold Xin
> Assignee: Reynold Xin
>
> TorrentBroadcast is unnecessarily complicated:
> 1. It tracks a lot of mutable states, such as total number of bytes, number
> of blocks fetched.
> 2. It has at least two data structures that are not needed: TorrentInfo and
> TorrentBlock.
> 3. It uses getSingle on executors to get the block instead of getLocal,
> resulting in an extra roundtrip to look up the location of the block when the
> block doesn't exist yet.
> 4. It has a metadata block that is completely unnecessary.
> 5. It does an extra memory copy during deserialization to copy all the blocks
> into a single giant array.
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]