Re: When does Spark switch from PROCESS_LOCAL to NODE_LOCAL or RACK_LOCAL?

2014-09-15 Thread Andrew Ash
Hi Brad and Nick, Thanks for the comments! I opened a ticket to get a more thorough explanation of data locality into the docs here: https://issues.apache.org/jira/browse/SPARK-3526 If you could put any other unanswered questions you have about data locality on that ticket I'll try to

Re: When does Spark switch from PROCESS_LOCAL to NODE_LOCAL or RACK_LOCAL?

2014-09-14 Thread Brad Miller
Hi Andrew, I agree with Nicholas. That was a nice, concise summary of the meaning of the locality customization options, indicators and default Spark behaviors. I haven't combed through the documentation end-to-end in a while, but I'm also not sure that information is presently represented

Re: When does Spark switch from PROCESS_LOCAL to NODE_LOCAL or RACK_LOCAL?

2014-09-12 Thread Nicholas Chammas
Andrew, This email was pretty helpful. I feel like this stuff should be summarized in the docs somewhere, or perhaps in a blog post. Do you know if it is? Nick On Thu, Jun 5, 2014 at 6:36 PM, Andrew Ash and...@andrewash.com wrote: The locality is how close the data is to the code that's

Re: When does Spark switch from PROCESS_LOCAL to NODE_LOCAL or RACK_LOCAL?

2014-09-12 Thread Tsai Li Ming
Another observation I had was reading over local filesystem with “file://“. it was stated as PROCESS_LOCAL which was confusing. Regards, Liming On 13 Sep, 2014, at 3:12 am, Nicholas Chammas nicholas.cham...@gmail.com wrote: Andrew, This email was pretty helpful. I feel like this stuff

When does Spark switch from PROCESS_LOCAL to NODE_LOCAL or RACK_LOCAL?

2014-06-05 Thread Sung Hwan Chung
I noticed that sometimes tasks would switch from PROCESS_LOCAL (I'd assume that this means fully cached) to NODE_LOCAL or even RACK_LOCAL. When these happen things get extremely slow. Does this mean that the executor got terminated and restarted? Is there a way to prevent this from happening

Re: When does Spark switch from PROCESS_LOCAL to NODE_LOCAL or RACK_LOCAL?

2014-06-05 Thread Sung Hwan Chung
On a related note, I'd also minimize any kind of executor movement. I.e., once an executor is spawned and data cached in the executor, I want that executor to live all the way till the job is finished, or the machine fails in a fatal manner. What would be the best way to ensure that this is the

RE: When does Spark switch from PROCESS_LOCAL to NODE_LOCAL or RACK_LOCAL?

2014-06-05 Thread Liu, Raymond
Sent: Friday, June 06, 2014 6:53 AM To: user@spark.apache.org Subject: Re: When does Spark switch from PROCESS_LOCAL to NODE_LOCAL or RACK_LOCAL? Additionally, I've encountered some confusing situation where the locality level for a task showed up as 'PROCESS_LOCAL' even though I didn't cache