[ 
https://issues.apache.org/jira/browse/SPARK-27651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17051995#comment-17051995
 ] 

Attila Zsolt Piros commented on SPARK-27651:
--------------------------------------------

Yes, the final implementation works only when the external shuffle service is 
used as the local directories of the other host local executors are asked from 
the external shuffle service. 
The initial implementation when the PR was opened was using the driver to get 
the host local directories.

The technical reasons of asking the external shuffle service was:
 * decreasing network pressure on the driver (main reason).  
 * getting rid of an unbounded (or bounded but in that case complex fall back 
logic at the fetcher) map which maps the executors to local dirs. In addition 
does that redundantly as this information is already available at the external 
shuffle service just stored in distributed way I mean at a running ext shuffle 
service process only for those executor data are stored which are on the same 
host. 

> Avoid the network when block manager fetches shuffle blocks from the same host
> ------------------------------------------------------------------------------
>
>                 Key: SPARK-27651
>                 URL: https://issues.apache.org/jira/browse/SPARK-27651
>             Project: Spark
>          Issue Type: Improvement
>          Components: Block Manager
>    Affects Versions: 3.0.0
>            Reporter: Attila Zsolt Piros
>            Assignee: Attila Zsolt Piros
>            Priority: Major
>             Fix For: 3.0.0
>
>
> When a shuffle block (content) is fetched the network is always used even 
> when it is fetched from an executor (or the external shuffle service) running 
> on the same host.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to