Re: Removing the Mesos fine-grained mode

2015-11-23 Thread Andrew Or
@Jerry Lam Can someone confirm if it is true that dynamic allocation on mesos "is > designed to run one executor per slave with the configured amount of > resources." I copied this sentence from the documentation. Does this mean > there is at most 1 executor per node? Therefore, if you have a

Re: Removing the Mesos fine-grained mode

2015-11-23 Thread Jerry Lam
Hi Andrew, Thank you for confirming this. I’m referring to this because I used fine-grained mode before and it was a headache because of the memory issue. Therefore, I switched to Yarn with dynamic allocation. I was thinking if I can switch back to Mesos with coarse-grained mode + dynamic

Re: Removing the Mesos fine-grained mode

2015-11-23 Thread Jerry Lam
@Andrew Or I assume you are referring to this ticket [SPARK-5095]: https://issues.apache.org/jira/browse/SPARK-5095 Thank you! Best Regards, Jerry > On Nov 23, 2015, at 2:41 PM, Andrew Or wrote: > > @Jerry Lam > >

Re: Removing the Mesos fine-grained mode

2015-11-23 Thread Iulian Dragoș
On Sat, Nov 21, 2015 at 3:37 AM, Adam McElwee wrote: > I've used fine-grained mode on our mesos spark clusters until this week, > mostly because it was the default. I started trying coarse-grained because > of the recent chatter on the mailing list about wanting to move the

Re: Removing the Mesos fine-grained mode

2015-11-23 Thread Jerry Lam
Hi guys, Can someone confirm if it is true that dynamic allocation on mesos "is designed to run one executor per slave with the configured amount of resources." I copied this sentence from the documentation. Does this mean there is at most 1 executor per node? Therefore, if you have a big

Re: Spark-1.6.0-preview2 trackStateByKey exception restoring state

2015-11-23 Thread Tathagata Das
My intention is to make it compatible! Filed this bug - https://issues.apache.org/jira/browse/SPARK-11932 Looking into it right now. Thanks for testing it out and reporting this! On Mon, Nov 23, 2015 at 7:22 AM, jan wrote: > Hi guys, > > I'm trying out the new trackStateByKey

Re: load multiple directory using dataframe load

2015-11-23 Thread Fengdong Yu
hiveContext.read.format(“orc”).load(“bypath/*”) > On Nov 24, 2015, at 1:07 PM, Renu Yadav wrote: > > Hi , > > I am using dataframe and want to load orc file using multiple directory > like this: > hiveContext.read.format.load("mypath/3660,myPath/3661") > > but it is not

Fastest way to build Spark from scratch

2015-11-23 Thread Nicholas Chammas
Say I want to build a complete Spark distribution against Hadoop 2.6+ as fast as possible from scratch. This is what I’m doing at the moment: ./make-distribution.sh -T 1C -Phadoop-2.6 -T 1C instructs Maven to spin up 1 thread per available core. This takes around 20 minutes on an m3.large

Re: A proposal for Spark 2.0

2015-11-23 Thread Reynold Xin
I actually think the next one (after 1.6) should be Spark 2.0. The reason is that I already know we have to break some part of the DataFrame/Dataset API as part of the Dataset design. (e.g. DataFrame.map should return Dataset rather than RDD). In that case, I'd rather break this sooner (in one

why does shuffle in spark write shuffle data to disk by default?

2015-11-23 Thread huan zhang
Hi All, I'm wonderring why does shuffle in spark write shuffle data to disk by default? In Stackoverflow, someone said it's used by FTS, but node down is the most common reason of fault, and write to disk cannot do FTS in this case either. So why not use ramdisk as default instread of

Re: why does shuffle in spark write shuffle data to disk by default?

2015-11-23 Thread Reynold Xin
I think for most jobs the bottleneck isn't in writing shuffle data to disk, since shuffle data needs to be "shuffled" and sent across the network. You can always use a ramdisk yourself. Requiring ramdisk by default would significantly complicate configuration and platform portability. On Mon,

Datasets on experimental dataframes?

2015-11-23 Thread Jakob Odersky
Hi, datasets are being built upon the experimental DataFrame API, does this mean DataFrames won't be experimental in the near future? thanks, --Jakob

Re: Datasets on experimental dataframes?

2015-11-23 Thread Reynold Xin
The experimental tag is intended for user facing APIs. It has nothing to do with internal dependencies. On Monday, November 23, 2015, Jakob Odersky wrote: > Hi, > > datasets are being built upon the experimental DataFrame API, does this > mean DataFrames won't be

Spark-1.6.0-preview2 trackStateByKey exception restoring state

2015-11-23 Thread jan
Hi guys, I'm trying out the new trackStateByKey API of the Spark-1.6.0-preview2 release and I'm encountering an exception when trying to restore previously checkpointed state in spark streaming. Use case: - execute a stateful Spark streaming job using trackStateByKey - interrupt / kill the job -

Re: Removing the Mesos fine-grained mode

2015-11-23 Thread Adam McElwee
On Mon, Nov 23, 2015 at 7:36 AM, Iulian Dragoș wrote: > > > On Sat, Nov 21, 2015 at 3:37 AM, Adam McElwee wrote: > >> I've used fine-grained mode on our mesos spark clusters until this week, >> mostly because it was the default. I started trying

Re: [ANNOUNCE] Spark 1.6.0 Release Preview

2015-11-23 Thread mkhaitman
Nice! Built and testing on CentOS 7 on a Hadoop 2.7.1 cluster. One thing I've noticed is that KeyboardInterrupts are now ignored? Is that intended? I starting typing a line out and then changed my mind and wanted to issue the good old ctrl+c to interrupt, but that didn't work. Otherwise haven't

question about combining small input splits

2015-11-23 Thread Nezih
Hi Spark Devs, I tried getting an answer to my question in the user mailing list, but so far couldn't. That's why I wanted to try the dev mailing list too in case someone can help me. I have a Hive table that has a lot of small parquet files and I am creating a data frame out of it to do some

Re: [ANNOUNCE] Spark 1.6.0 Release Preview

2015-11-23 Thread Dean Wampler
I'm seeing an RPC timeout with the 2.11 build, but not the Hadoop1, 2.10 build: The following session with two uses of sc.parallize causes it almost every the time. Occasionally I don't see the stack trace and I don't see it with just a single sc.parallize, even the bigger, second one. When the