[ 
https://issues.apache.org/jira/browse/SPARK-28778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-28778.
-----------------------------------
    Fix Version/s: 3.0.0
       Resolution: Fixed

This is resolved via https://github.com/apache/spark/pull/25500

> Shuffle jobs fail due to incorrect advertised address when running in virtual 
> network
> -------------------------------------------------------------------------------------
>
>                 Key: SPARK-28778
>                 URL: https://issues.apache.org/jira/browse/SPARK-28778
>             Project: Spark
>          Issue Type: Bug
>          Components: Mesos
>    Affects Versions: 2.2.3, 2.3.0, 2.4.3
>            Reporter: Anton Kirillov
>            Priority: Major
>              Labels: Mesos
>             Fix For: 3.0.0
>
>
> When shuffle jobs are launched by Mesos in a virtual network, Mesos scheduler 
> sets executor {{--hostname}} parameter to {{0.0.0.0}} in the case when 
> {{spark.mesos.network.name}} is provided. This makes executors use 
> {{0.0.0.0}} as their advertised address and, in the presence of shuffle, 
> executors fail to fetch shuffle blocks from each other using {{0.0.0.0}} as 
> the origin. When a virtual network is used the hostname or IP address is not 
> known upfront and assigned to a container at its start time so the executor 
> process needs to advertise the correct dynamically assigned address to be 
> reachable by other executors.
> h3.  
> The bug described above prevents Mesos users from running any jobs which 
> involve shuffle due to the inability of executors to fetch shuffle blocks 
> because of incorrect advertised address when virtual network is used.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to