Hi,
The coalesce does not automatically happen now and you need to control the
number for yourself.
Basically, #partitions respect a `spark.default.parallelism` number, by
default, #cores for your computer.
http://spark.apache.org/docs/latest/configuration.html#execution-behavior
// maropu
On T
Hello List,
I was wondering what is the design principle that partition size of
an RDD is inherited from the parent. See one simple example below
[*]. 'ngauss_rdd2' has significantly less data, intuitively in such
cases, shouldn't spark invoke coalesce automatically for performance?
What would b
Hello List,
I was wondering what is the design principle that partition size of
an RDD is inherited from the parent. See one simple example below
[*]. 'ngauss_rdd2' has significantly less data, intuitively in such
cases, shouldn't spark invoke coalesce automatically for performance?
What would b