Hi Jacek, I'm not entirely sure I understand your question, but the reason preferredLocs can be transient is b/c that is used to define where the scheduler (on the driver) should prefer to assign the task. But no matter the value, the task could still get assigned anywhere. By the time that task has been assigned a location, and its running on an executor, it doesn't matter anymore.
preferredLocations are entirely independent of having the map task know where to fetch its input shuffle data, and where the shuffle map task writes it output data. All of that info goes through MapOutputTracker. hope that helps, Imran On Tue, Jan 3, 2017 at 5:27 AM, Jacek Laskowski <ja...@japila.pl> wrote: > Hi, > > Just found out that ShuffleMapTask has transient locs and > preferredLocs attributes which means that when ShuffleMapTask is > serialized (as a broadcast variable) the information is gone. > > Does this mean that the attributes could have not been defined at all > since Spark uses SortShuffleManager (and BlockManagerMaster on the > driver) to track the shuffle locations (MapStatuses)? > > Is my understanding correct? What am I missing? (I'm exploring shuffle > system currently and would appreciate comments a lot!) Thanks! > > Pozdrawiam, > Jacek Laskowski > ---- > https://medium.com/@jaceklaskowski/ > Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark > Follow me at https://twitter.com/jaceklaskowski > > --------------------------------------------------------------------- > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >