Re: Strange shuffle behaviour difference between Zeppelin and Spark-shell

2015-09-29 Thread Rick Moritz
using very >>>> similar configurations and identical data, machines and code (identical >>>> DAGs and sources) and identical spark binaries? Why would code launched >>>> from spark-shell generate more shuffled data for the same number of >>>>

Re: Strange shuffle behaviour difference between Zeppelin and Spark-shell

2015-09-28 Thread Kartik Mathur
---- Forwarded message -- >> From: Rick Moritz <rah...@gmail.com> >> Date: Wed, Aug 19, 2015 at 2:46 PM >> Subject: Re: Strange shuffle behaviour difference between Zeppelin and >> Spark-shell >> To: Igor Berman <igor.ber...@gmail.com> >>

Re: Strange shuffle behaviour difference between Zeppelin and Spark-shell

2015-09-28 Thread Kartik Mathur
ntical DAGs and >>> sources) and identical spark binaries? Why would code launched from >>> spark-shell generate more shuffled data for the same number of shuffled >>> tuples? >>> >>> An analysis would be much appreciated. >>> >

Re: Strange shuffle behaviour difference between Zeppelin and Spark-shell

2015-09-28 Thread Rick Moritz
: Rick Moritz <rah...@gmail.com> > Date: Wed, Aug 19, 2015 at 2:46 PM > Subject: Re: Strange shuffle behaviour difference between Zeppelin and > Spark-shell > To: Igor Berman <igor.ber...@gmail.com> > > > Those values are not explicitely

Re: Strange shuffle behaviour difference between Zeppelin and Spark-shell

2015-08-19 Thread Igor Berman
any differences in number of cores, memory settings for executors? On 19 August 2015 at 09:49, Rick Moritz rah...@gmail.com wrote: Dear list, I am observing a very strange difference in behaviour between a Spark 1.4.0-rc4 REPL (locally compiled with Java 7) and a Spark 1.4.0 zeppelin

Re: Strange shuffle behaviour difference between Zeppelin and Spark-shell

2015-08-19 Thread Igor Berman
i would compare spark ui metrics for both cases and see any differences(number of partitions, number of spills etc) why can't you make repl to be consistent with zepellin spark version? might be rc has issues... On 19 August 2015 at 14:42, Rick Moritz rah...@gmail.com wrote: No, the setup

Fwd: Strange shuffle behaviour difference between Zeppelin and Spark-shell

2015-08-19 Thread Rick Moritz
oops, forgot to reply-all on this thread. -- Forwarded message -- From: Rick Moritz rah...@gmail.com Date: Wed, Aug 19, 2015 at 2:46 PM Subject: Re: Strange shuffle behaviour difference between Zeppelin and Spark-shell To: Igor Berman igor.ber...@gmail.com Those values

Re: Strange shuffle behaviour difference between Zeppelin and Spark-shell

2015-08-19 Thread Rick Moritz
No, the setup is one driver with 32g of memory, and three executors each with 8g of memory in both cases. No core-number has been specified, thus it should default to single-core (though I've seen the yarn-owned jvms wrapping the executors take up to 3 cores in top). That is, unless, as I

Strange shuffle behaviour difference between Zeppelin and Spark-shell

2015-08-19 Thread Rick Moritz
Dear list, I am observing a very strange difference in behaviour between a Spark 1.4.0-rc4 REPL (locally compiled with Java 7) and a Spark 1.4.0 zeppelin interpreter (compiled with Java 6 and sourced from maven central). The workflow loads data from Hive, applies a number of transformations