[ 
https://issues.apache.org/jira/browse/SPARK-12546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12546:
------------------------------------

    Assignee: Michael Armbrust  (was: Apache Spark)

> Writing to partitioned parquet table can fail with OOM
> ------------------------------------------------------
>
>                 Key: SPARK-12546
>                 URL: https://issues.apache.org/jira/browse/SPARK-12546
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.6.0
>            Reporter: Nong Li
>            Assignee: Michael Armbrust
>            Priority: Blocker
>
> It is possible to have jobs fail with OOM when writing to a partitioned 
> parquet table. While this was probably always possible, it is more likely in 
> 1.6 due to the memory manager changes. The unified memory manager enables 
> Spark to use more of the process memory (in particular, for execution) which 
> gets us in this state more often. This issue can happen for libraries that 
> consume a lot of memory, such as parquet. Prior to 1.6, these libraries would 
> more likely use memory that spark was not using (i.e. from the storage pool). 
> In 1.6, this storage memory can now be used for execution.
> There are a couple of configs that can help with this issue.
>   - parquet.memory.pool.ratio: This is a parquet config on how much of the 
> heap the parquet writers should use. This default to .95. Consider a much 
> lower value (e.g. 0.1)
>   - spark.memory.faction: This is a spark config to control how much of the 
> memory should be allocated to spark. Consider setting this to 0.6.
> This should cause jobs to potentially spill more but require less memory. 
> More aggressive tuning will control this trade off.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to