Re: Spark job server pros and cons

2016-12-09 Thread Shak S
eed to have some learning curve and trouble shooting. On Fri, Dec 9, 2016 at 4:31 PM, Cassa L <lcas...@gmail.com> wrote: > Hi, > So far, I ran spark jobs directly using spark-submit options. I have a > use case to use Spark Job server to run the job. I wanted to find out PROS &

Spark job server pros and cons

2016-12-09 Thread Cassa L
Hi, So far, I ran spark jobs directly using spark-submit options. I have a use case to use Spark Job server to run the job. I wanted to find out PROS and CONs of using this job server? If anyone can share it, it will be great. My jobs usually connected to multiple data sources like Kafka, Custom

Re: Pros and Cons

2016-05-27 Thread Teng Qiu
yes, only for engine, but maybe newer version has more optimization from tungsten project? at least since spark 1.6? > -- Forwarded message -- > From: Mich Talebzadeh <mich.talebza...@gmail.com> > Date: 27 May 2016 at 17:09 > Subject: Re: Pros and Cons &g

Fwd: Pros and Cons

2016-05-27 Thread Mich Talebzadeh
: Mich Talebzadeh <mich.talebza...@gmail.com> Date: 27 May 2016 at 17:09 Subject: Re: Pros and Cons To: Teng Qiu <teng...@gmail.com> Cc: Ted Yu <yuzhih...@gmail.com>, Koert Kuipers <ko...@tresata.com>, Jörn Franke <jornfra...@gmail.com>, user <user@spark.apache.org>, Aa

Re: Pros and Cons

2016-05-27 Thread Teng Qiu
tried spark 2.0.0 preview, but no assembly jar there... then just gave up... :p 2016-05-27 17:39 GMT+02:00 Ted Yu : > Teng: > Why not try out the 2.0 SANPSHOT build ? > > Thanks > >> On May 27, 2016, at 7:44 AM, Teng Qiu wrote: >> >> ah, yes, the version

Re: Pros and Cons

2016-05-27 Thread Mich Talebzadeh
Hi Ted, do you mean Hive 2 with spark 2 snapshot build as the execution engine just binaries for snapshot (all ok)? Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

Re: Pros and Cons

2016-05-27 Thread Ted Yu
Teng: Why not try out the 2.0 SANPSHOT build ? Thanks > On May 27, 2016, at 7:44 AM, Teng Qiu wrote: > > ah, yes, the version is another mess!... no vendor's product > > i tried hadoop 2.6.2, hive 1.2.1 with spark 1.6.1, doesn't work. > > hadoop 2.6.2, hive 2.0.1 with

Re: Pros and Cons

2016-05-27 Thread Teng Qiu
ah, yes, the version is another mess!... no vendor's product i tried hadoop 2.6.2, hive 1.2.1 with spark 1.6.1, doesn't work. hadoop 2.6.2, hive 2.0.1 with spark 1.6.1, works, but need to fix this from hive side https://issues.apache.org/jira/browse/HIVE-13301 the jackson-databind lib from

Re: Pros and Cons

2016-05-27 Thread Mich Talebzadeh
Hi Teng, what version of spark are using as the execution engine. are you using a vendor's product here? thanks Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

Re: Pros and Cons

2016-05-27 Thread Teng Qiu
I agree with Koert and Reynold, spark works well with large dataset now. back to the original discussion, compare SparkSQL vs Hive in Spark vs Spark API. SparkSQL vs Spark API you can simply imagine you are in RDBMS world, SparkSQL is pure SQL, and Spark API is language for writing stored

Re: Pros and Cons

2016-05-26 Thread Koert Kuipers
We do disk-to-disk iterative algorithms in spark all the time, on datasets that do not fit in memory, and it works well for us. I usually have to do some tuning of number of partitions for a new dataset but that's about it in terms of inconveniences. On May 26, 2016 2:07 AM, "Jörn Franke"

Re: Pros and Cons

2016-05-26 Thread Jörn Franke
Spark can handle this true, but it is optimized for the idea that it works it works on the same full dataset in-memory due to the underlying nature of machine learning algorithms (iterative). Of course, you can spill over, but that you should avoid. That being said you should have read my

Re: Pros and Cons

2016-05-25 Thread Reynold Xin
On Wed, May 25, 2016 at 9:52 AM, Jörn Franke wrote: > Spark is more for machine learning working iteravely over the whole same > dataset in memory. Additionally it has streaming and graph processing > capabilities that can be used together. > Hi Jörn, The first part is

Re: Pros and Cons

2016-05-25 Thread Jörn Franke
; HTH > > > Dr Mich Talebzadeh > > LinkedIn > https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > > http://talebzadehmich.wordpress.com > > >> On 25 May 2016 at 16:34, Aakash Basu <raj2coo...@gmail.com>

Re: Pros and Cons

2016-05-25 Thread Mich Talebzadeh
wrote: > Hi, > > > > I’m new to the Spark Ecosystem, need to understand the *Pros and Cons *of > fetching data using *SparkSQL vs Hive in Spark vs Spark API.* > > > > *PLEASE HELP!* > > > > Thanks, > > Aakash Basu. >

Pros and Cons

2016-05-25 Thread Aakash Basu
Hi, I’m new to the Spark Ecosystem, need to understand the *Pros and Cons *of fetching data using *SparkSQL vs Hive in Spark vs Spark API.* *PLEASE HELP!* Thanks, Aakash Basu.

Pros and cons -Saving spark data in hive

2015-12-15 Thread Divya Gehlot
Hi, I am new bee to Spark and I am exploring option and pros and cons which one will work best in spark and hive context.My dataset inputs are CSV files, using spark to process the my data and saving it in hive using hivecontext 1) Process the CSV file using spark-csv package and create

Re: Pros and cons -Saving spark data in hive

2015-12-15 Thread Sabarish Sasidharan
I am new bee to Spark and I am exploring option and pros and cons which > one will work best in spark and hive context.My dataset inputs are CSV > files, using spark to process the my data and saving it in hive using > hivecontext > > 1) Process the CSV file using spark-csv pac