Hi Brad and Nick,
Thanks for the comments! I opened a ticket to get a more thorough
explanation of data locality into the docs here:
https://issues.apache.org/jira/browse/SPARK-3526
If you could put any other unanswered questions you have about data
locality on that ticket I'll try to
Hi Andrew,
I agree with Nicholas. That was a nice, concise summary of the
meaning of the locality customization options, indicators and default
Spark behaviors. I haven't combed through the documentation
end-to-end in a while, but I'm also not sure that information is
presently represented
Andrew,
This email was pretty helpful. I feel like this stuff should be summarized
in the docs somewhere, or perhaps in a blog post.
Do you know if it is?
Nick
On Thu, Jun 5, 2014 at 6:36 PM, Andrew Ash and...@andrewash.com wrote:
The locality is how close the data is to the code that's
Another observation I had was reading over local filesystem with “file://“. it
was stated as PROCESS_LOCAL which was confusing.
Regards,
Liming
On 13 Sep, 2014, at 3:12 am, Nicholas Chammas nicholas.cham...@gmail.com
wrote:
Andrew,
This email was pretty helpful. I feel like this stuff
I noticed that sometimes tasks would switch from PROCESS_LOCAL (I'd assume
that this means fully cached) to NODE_LOCAL or even RACK_LOCAL.
When these happen things get extremely slow.
Does this mean that the executor got terminated and restarted?
Is there a way to prevent this from happening
On a related note, I'd also minimize any kind of executor movement. I.e.,
once an executor is spawned and data cached in the executor, I want that
executor to live all the way till the job is finished, or the machine fails
in a fatal manner.
What would be the best way to ensure that this is the
Sent: Friday, June 06, 2014 6:53 AM
To: user@spark.apache.org
Subject: Re: When does Spark switch from PROCESS_LOCAL to NODE_LOCAL or
RACK_LOCAL?
Additionally, I've encountered some confusing situation where the locality
level for a task showed up as 'PROCESS_LOCAL' even though I didn't cache