[jira] [Updated] (SPARK-10000) Consolidate cache memory management and execution memory management

Andrew Or (JIRA) Wed, 07 Oct 2015 14:43:36 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-10000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Andrew Or updated SPARK-10000:
------------------------------
    Description: 
Memory management in Spark is currently broken down into two disjoint regions: 
one for execution and one for storage. The sizes of these regions are 
statically configured and fixed for the duration of the application.

There are several limitations to this approach. It requires user expertise to 
avoid unnecessary spilling, and there are no sensible defaults that will work 
for all workloads. As a Spark user, I want Spark to manage the memory more 
intelligently so I do not need to worry about how to statically partition the 
execution (shuffle) memory fraction and cache memory fraction. More 
importantly, applications that do not use caching use only a small fraction of 
the heap space, resulting in suboptimal performance.

Instead, we should unify these two regions and let one borrow from another if 
possible.

  was:
Memory management in Spark is currently broken down into two disjoint regions: 
one for execution and one for storage. The sizes of these regions are 
statically configured and fixed for the duration of the application.

There are several limitations to this approach. It requires user expertise to 
avoid unnecessary spilling, and there are no sensible defaults that will work 
for all workloads. As a Spark user, I want Spark to manage the memory more 
intelligently so I do not need to worry about how to statically partition the 
execution (shuffle) memory fraction and cache memory fraction. Most 
importantly, applications that do not use caching use only a small fraction of 
the heap space, resulting in suboptimal performance.




> Consolidate cache memory management and execution memory management
> -------------------------------------------------------------------
>
>                 Key: SPARK-10000
>                 URL: https://issues.apache.org/jira/browse/SPARK-10000
>             Project: Spark
>          Issue Type: Story
>          Components: Block Manager, Spark Core
>            Reporter: Reynold Xin
>
> Memory management in Spark is currently broken down into two disjoint 
> regions: one for execution and one for storage. The sizes of these regions 
> are statically configured and fixed for the duration of the application.
> There are several limitations to this approach. It requires user expertise to 
> avoid unnecessary spilling, and there are no sensible defaults that will work 
> for all workloads. As a Spark user, I want Spark to manage the memory more 
> intelligently so I do not need to worry about how to statically partition the 
> execution (shuffle) memory fraction and cache memory fraction. More 
> importantly, applications that do not use caching use only a small fraction 
> of the heap space, resulting in suboptimal performance.
> Instead, we should unify these two regions and let one borrow from another if 
> possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10000) Consolidate cache memory management and execution memory management

Reply via email to