[
https://issues.apache.org/jira/browse/SPARK-24519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Thomas Graves resolved SPARK-24519.
-----------------------------------
Resolution: Fixed
Assignee: Hieu Tri Huynh
Fix Version/s: 2.4.0
> MapStatus has 2000 hardcoded
> ----------------------------
>
> Key: SPARK-24519
> URL: https://issues.apache.org/jira/browse/SPARK-24519
> Project: Spark
> Issue Type: Improvement
> Components: Spark Core
> Affects Versions: 2.3.0
> Reporter: Hieu Tri Huynh
> Assignee: Hieu Tri Huynh
> Priority: Minor
> Fix For: 2.4.0
>
>
> MapStatus uses hardcoded value of 2000 partitions to determine if it should
> use highly compressed map status. We should make it configurable to allow
> users to more easily tune their jobs with respect to this without having for
> them to modify their code to change the number of partitions. Note we can
> leave this as an internal/undocumented config for now until we have more
> advise for the users on how to set this config.
> Some of my reasoning:
> The config gives you a way to easily change something without the user having
> to change code, redeploy jar, and then run again. You can simply change the
> config and rerun. It also allows for easier experimentation. Changing the #
> of partitions has other side affects, whether good or bad is situation
> dependent. It can be worse are you could be increasing # of output files when
> you don't want to be, affects the # of tasks needs and thus executors to run
> in parallel, etc.
> There have been various talks about this number at spark summits where people
> have told customers to increase it to be 2001 partitions. Note if you just do
> a search for spark 2000 partitions you will fine various things all talking
> about this number. This shows that people are modifying their code to take
> this into account so it seems to me having this configurable would be better.
> Once we have more advice for users we could expose this and document
> information on it.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]