[
https://issues.apache.org/jira/browse/SPARK-56199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated SPARK-56199:
-----------------------------------
Labels: pull-request-available (was: )
> ShuffleBlockFetcherIterator should not read FalbackStorage blocks as local
> blocks
> ---------------------------------------------------------------------------------
>
> Key: SPARK-56199
> URL: https://issues.apache.org/jira/browse/SPARK-56199
> Project: Spark
> Issue Type: Improvement
> Components: Spark Core
> Affects Versions: 4.2.0
> Reporter: Enrico Minack
> Priority: Major
> Labels: pull-request-available
>
> The ShuffleBlockFetcherIterator treats blocks stored on the FallbackStorage
> (very likely a remote distributed storage like S3 or HDFS) as local block
> files.
> The current implementation has the following disadvantages:
> - blocks are read from fallback storage single threaded and synchronously
> (one by one)
> - waiting for fallback storage blocks being transferred over the network
> blocks reading local blocks
> - fallback storage blocks are read in the {{catch}} branch when trying to
> read it as a local block (which cannot be found locally)
> Reading fallback storage blocks should be treated like remote blocks: fetched
> independently from local blocks, asynchronously and multi-threaded.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]