Idan Zalzberg created SPARK-5319:
------------------------------------
Summary: Choosing partition size instead of count
Key: SPARK-5319
URL: https://issues.apache.org/jira/browse/SPARK-5319
Project: Spark
Issue Type: Brainstorming
Reporter: Idan Zalzberg
With the current API, there are multiple locations when you can set the
partition count when reading from sources.
However IME, it is sometimes useful to set the partition size (in MB), and
infer the count from that.
IME, spark is sensitive to the partition size, if they are too big, it raises
the amount of memory needed per core, and if they are too small then the stage
times increase significantly, so I'd like to stay in the "sweet spot" of the
partition size, without trying to change the partition count around until I
find it.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]