releasing Spark 1.4.2

2015-11-15 Thread Niranda Perera
Hi, I am wondering when spark 1.4.2 will be released? is it in the voting stage at the moment? rgds -- Niranda @n1r44 +94-71-554-8430 https://pythagoreanscript.wordpress.com/

Re: Hive on Spark Vs Spark SQL

2015-11-15 Thread Reynold Xin
No it does not -- although it'd benefit from some of the work to make shuffle more robust. On Sun, Nov 15, 2015 at 10:45 PM, kiran lonikar wrote: > So does not benefit from Project Tungsten right? > > > On Mon, Nov 16, 2015 at 12:07 PM, Reynold Xin wrote: > >> It's a completely different path.

Re: Hive on Spark Vs Spark SQL

2015-11-15 Thread kiran lonikar
So does not benefit from Project Tungsten right? On Mon, Nov 16, 2015 at 12:07 PM, Reynold Xin wrote: > It's a completely different path. > > > On Sun, Nov 15, 2015 at 10:37 PM, kiran lonikar wrote: > >> I would like to know if Hive on Spark uses or shares the execution code >> with Spark SQL

Re: Hive on Spark Vs Spark SQL

2015-11-15 Thread Reynold Xin
It's a completely different path. On Sun, Nov 15, 2015 at 10:37 PM, kiran lonikar wrote: > I would like to know if Hive on Spark uses or shares the execution code > with Spark SQL or DataFrames? > > More specifically, does Hive on Spark benefit from the changes made to > Spark SQL, project Tung

Hive on Spark Vs Spark SQL

2015-11-15 Thread kiran lonikar
I would like to know if Hive on Spark uses or shares the execution code with Spark SQL or DataFrames? More specifically, does Hive on Spark benefit from the changes made to Spark SQL, project Tungsten? Or is it completely different execution path where it creates its own plan and executes on RDD?

Re: Support for local disk columnar storage for DataFrames

2015-11-15 Thread Reynold Xin
This (updates) is something we are going to think about in the next release or two. On Thu, Nov 12, 2015 at 8:57 AM, Cristian O wrote: > Sorry, apparently only replied to Reynold, meant to copy the list as well, > so I'm self replying and taking the opportunity to illustrate with an > example. >

Hive Context incompatible with Sentry enabled Cluster

2015-11-15 Thread Charmee Patel
Hi, We have recently run into this issue: https://issues.apache.org/jira/browse/SPARK-9042 My organization's application reads raw data from files, processes/cleanses it and pushes the results to Hive tables. To keep reads efficient, we have partitioned our tables. In a Sentry enabled cluster, ou

Re: A proposal for Spark 2.0

2015-11-15 Thread Prashant Sharma
Hey Matei, > Regarding Scala 2.12, we should definitely support it eventually, but I > don't think we need to block 2.0 on that because it can be added later too. > Has anyone investigated what it would take to run on there? I imagine we > don't need many code changes, just maybe some REPL stuff.

Re: Are map tasks spilling data to disk?

2015-11-15 Thread Reynold Xin
It depends on what the next operator is. If the next operator is just an aggregation, then no, the hash join won't write anything to disk. It will just stream the data through to the next operator. If the next operator is shuffle (exchange), then yes. On Sun, Nov 15, 2015 at 10:52 AM, gsvic wrote

Map Tasks - Disk Spill (?)

2015-11-15 Thread gsvic
According to this paper Spak's map tasks writes the results to disk. My actual question is, in BroadcastHashJoin

Are map tasks spilling data to disk?

2015-11-15 Thread gsvic
According to this paper Spak's map tasks writes the results to disk. My actual question is, in BroadcastHashJoin

Re: spark 1.4 GC issue

2015-11-15 Thread Ted Yu
Please take a look at http://www.infoq.com/articles/tuning-tips-G1-GC Cheers On Sat, Nov 14, 2015 at 10:03 PM, Renu Yadav wrote: > I have tried with G1 GC .Please if anyone can provide their setting for GC. > At code level I am : > 1.reading orc table usind dataframe > 2.map df to rdd of my cas