using very
>>>> similar configurations and identical data, machines and code (identical
>>>> DAGs and sources) and identical spark binaries? Why would code launched
>>>> from spark-shell generate more shuffled data for the same number of
>>>>
---- Forwarded message --
>> From: Rick Moritz <rah...@gmail.com>
>> Date: Wed, Aug 19, 2015 at 2:46 PM
>> Subject: Re: Strange shuffle behaviour difference between Zeppelin and
>> Spark-shell
>> To: Igor Berman <igor.ber...@gmail.com>
>>
ntical DAGs and
>>> sources) and identical spark binaries? Why would code launched from
>>> spark-shell generate more shuffled data for the same number of shuffled
>>> tuples?
>>>
>>> An analysis would be much appreciated.
>>>
>
: Rick Moritz <rah...@gmail.com>
> Date: Wed, Aug 19, 2015 at 2:46 PM
> Subject: Re: Strange shuffle behaviour difference between Zeppelin and
> Spark-shell
> To: Igor Berman <igor.ber...@gmail.com>
>
>
> Those values are not explicitely
any differences in number of cores, memory settings for executors?
On 19 August 2015 at 09:49, Rick Moritz rah...@gmail.com wrote:
Dear list,
I am observing a very strange difference in behaviour between a Spark
1.4.0-rc4 REPL (locally compiled with Java 7) and a Spark 1.4.0 zeppelin
i would compare spark ui metrics for both cases and see any
differences(number of partitions, number of spills etc)
why can't you make repl to be consistent with zepellin spark version?
might be rc has issues...
On 19 August 2015 at 14:42, Rick Moritz rah...@gmail.com wrote:
No, the setup
oops, forgot to reply-all on this thread.
-- Forwarded message --
From: Rick Moritz rah...@gmail.com
Date: Wed, Aug 19, 2015 at 2:46 PM
Subject: Re: Strange shuffle behaviour difference between Zeppelin and
Spark-shell
To: Igor Berman igor.ber...@gmail.com
Those values
No, the setup is one driver with 32g of memory, and three executors each
with 8g of memory in both cases. No core-number has been specified, thus it
should default to single-core (though I've seen the yarn-owned jvms
wrapping the executors take up to 3 cores in top). That is, unless, as I
Dear list,
I am observing a very strange difference in behaviour between a Spark
1.4.0-rc4 REPL (locally compiled with Java 7) and a Spark 1.4.0 zeppelin
interpreter (compiled with Java 6 and sourced from maven central).
The workflow loads data from Hive, applies a number of transformations