Hi Imran,

Yes, you're right. I stand corrected! Thanks.

This is the part that opened my eyes:

> By the time that task has been assigned a location, and its running on an 
> executor, it doesn't matter anymore.

That's why a task does not have to have it after deserialization (!)
Thanks a lot.

On to digging deeper...

Pozdrawiam,
Jacek Laskowski
----
https://medium.com/@jaceklaskowski/
Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Tue, Jan 3, 2017 at 10:08 PM, Imran Rashid <iras...@cloudera.com> wrote:
> Hi Jacek,
>
> I'm not entirely sure I understand your question, but the reason
> preferredLocs can be transient is b/c that is used to define where the
> scheduler (on the driver) should prefer to assign the task.  But no matter
> the value, the task could still get assigned anywhere.  By the time that
> task has been assigned a location, and its running on an executor, it
> doesn't matter anymore.
>
> preferredLocations are entirely independent of having the map task know
> where to fetch its input shuffle data, and where the shuffle map task writes
> it output data.  All of that info goes through MapOutputTracker.
>
> hope that helps,
> Imran
>
> On Tue, Jan 3, 2017 at 5:27 AM, Jacek Laskowski <ja...@japila.pl> wrote:
>>
>> Hi,
>>
>> Just found out that ShuffleMapTask has transient locs and
>> preferredLocs attributes which means that when ShuffleMapTask is
>> serialized (as a broadcast variable) the information is gone.
>>
>> Does this mean that the attributes could have not been defined at all
>> since Spark uses SortShuffleManager (and BlockManagerMaster on the
>> driver) to track the shuffle locations (MapStatuses)?
>>
>> Is my understanding correct? What am I missing? (I'm exploring shuffle
>> system currently and would appreciate comments a lot!) Thanks!
>>
>> Pozdrawiam,
>> Jacek Laskowski
>> ----
>> https://medium.com/@jaceklaskowski/
>> Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark
>> Follow me at https://twitter.com/jaceklaskowski
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Reply via email to