Hi Jacek,

I'm not entirely sure I understand your question, but the reason
preferredLocs can be transient is b/c that is used to define where the
scheduler (on the driver) should prefer to assign the task.  But no matter
the value, the task could still get assigned anywhere.  By the time that
task has been assigned a location, and its running on an executor, it
doesn't matter anymore.

preferredLocations are entirely independent of having the map task know
where to fetch its input shuffle data, and where the shuffle map task
writes it output data.  All of that info goes through MapOutputTracker.

hope that helps,
Imran

On Tue, Jan 3, 2017 at 5:27 AM, Jacek Laskowski <ja...@japila.pl> wrote:

> Hi,
>
> Just found out that ShuffleMapTask has transient locs and
> preferredLocs attributes which means that when ShuffleMapTask is
> serialized (as a broadcast variable) the information is gone.
>
> Does this mean that the attributes could have not been defined at all
> since Spark uses SortShuffleManager (and BlockManagerMaster on the
> driver) to track the shuffle locations (MapStatuses)?
>
> Is my understanding correct? What am I missing? (I'm exploring shuffle
> system currently and would appreciate comments a lot!) Thanks!
>
> Pozdrawiam,
> Jacek Laskowski
> ----
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

Reply via email to