[
https://issues.apache.org/jira/browse/SPARK-12546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sean Owen updated SPARK-12546:
------------------------------
Component/s: SQL
> Writing to partitioned parquet table can fail with OOM
> ------------------------------------------------------
>
> Key: SPARK-12546
> URL: https://issues.apache.org/jira/browse/SPARK-12546
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 1.6.0
> Reporter: Nong Li
>
> It is possible to have jobs fail with OOM when writing to a partitioned
> parquet table. While this was probably always possible, it is more likely in
> 1.6 due to the memory manager changes. The unified memory manager enables
> Spark to use more of the process memory (in particular, for execution) which
> gets us in this state more often. This issue can happen for libraries that
> consume a lot of memory, such as parquet. Prior to 1.6, these libraries would
> more likely use memory that spark was not using (i.e. from the storage pool).
> In 1.6, this storage memory can now be used for execution.
> There are a couple of configs that can help with this issue.
> - parquet.memory.pool.ratio: This is a parquet config on how much of the
> heap the parquet writers should use. This default to .95. Consider a much
> lower value (e.g. 0.1)
> - spark.memory.faction: This is a spark config to control how much of the
> memory should be allocated to spark. Consider setting this to 0.6.
> This should cause jobs to potentially spill more but require less memory.
> More aggressive tuning will control this trade off.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]