[
https://issues.apache.org/jira/browse/SPARK-27651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17051995#comment-17051995
]
Attila Zsolt Piros commented on SPARK-27651:
--------------------------------------------
Yes, the final implementation works only when the external shuffle service is
used as the local directories of the other host local executors are asked from
the external shuffle service.
The initial implementation when the PR was opened was using the driver to get
the host local directories.
The technical reasons of asking the external shuffle service was:
* decreasing network pressure on the driver (main reason).
* getting rid of an unbounded (or bounded but in that case complex fall back
logic at the fetcher) map which maps the executors to local dirs. In addition
does that redundantly as this information is already available at the external
shuffle service just stored in distributed way I mean at a running ext shuffle
service process only for those executor data are stored which are on the same
host.
> Avoid the network when block manager fetches shuffle blocks from the same host
> ------------------------------------------------------------------------------
>
> Key: SPARK-27651
> URL: https://issues.apache.org/jira/browse/SPARK-27651
> Project: Spark
> Issue Type: Improvement
> Components: Block Manager
> Affects Versions: 3.0.0
> Reporter: Attila Zsolt Piros
> Assignee: Attila Zsolt Piros
> Priority: Major
> Fix For: 3.0.0
>
>
> When a shuffle block (content) is fetched the network is always used even
> when it is fetched from an executor (or the external shuffle service) running
> on the same host.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]