Hi Imran, Yes, you're right. I stand corrected! Thanks.
This is the part that opened my eyes: > By the time that task has been assigned a location, and its running on an > executor, it doesn't matter anymore. That's why a task does not have to have it after deserialization (!) Thanks a lot. On to digging deeper... Pozdrawiam, Jacek Laskowski ---- https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Tue, Jan 3, 2017 at 10:08 PM, Imran Rashid <iras...@cloudera.com> wrote: > Hi Jacek, > > I'm not entirely sure I understand your question, but the reason > preferredLocs can be transient is b/c that is used to define where the > scheduler (on the driver) should prefer to assign the task. But no matter > the value, the task could still get assigned anywhere. By the time that > task has been assigned a location, and its running on an executor, it > doesn't matter anymore. > > preferredLocations are entirely independent of having the map task know > where to fetch its input shuffle data, and where the shuffle map task writes > it output data. All of that info goes through MapOutputTracker. > > hope that helps, > Imran > > On Tue, Jan 3, 2017 at 5:27 AM, Jacek Laskowski <ja...@japila.pl> wrote: >> >> Hi, >> >> Just found out that ShuffleMapTask has transient locs and >> preferredLocs attributes which means that when ShuffleMapTask is >> serialized (as a broadcast variable) the information is gone. >> >> Does this mean that the attributes could have not been defined at all >> since Spark uses SortShuffleManager (and BlockManagerMaster on the >> driver) to track the shuffle locations (MapStatuses)? >> >> Is my understanding correct? What am I missing? (I'm exploring shuffle >> system currently and would appreciate comments a lot!) Thanks! >> >> Pozdrawiam, >> Jacek Laskowski >> ---- >> https://medium.com/@jaceklaskowski/ >> Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark >> Follow me at https://twitter.com/jaceklaskowski >> >> --------------------------------------------------------------------- >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> > --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org