[jira] [Commented] (SPARK-18421) Dynamic disk allocation

Aniket Bhatnagar (JIRA) Sun, 13 Nov 2016 08:54:07 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-18421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15661760#comment-15661760
 ]


Aniket Bhatnagar commented on SPARK-18421:
------------------------------------------

I agree that spark doesn't manage the storage and therefore, running of an 
agent and dynamic addition of storage to a host is outside the scope. However, 
what's in scope for spark is ability for spark to use added storage without 
forcing restart of executor process. Specifically, spark.local.dirs needs to be 
a dynamic property. For example, spark.local.dirs could be configured as a glob 
pattern (something like /mnt*) and whenever a new disk is added & mounted (as 
/mnt<disk_num>), spark's shuffle service should be able to use the locally 
added disk. Additionally, there maybe a task to rebalance shuffle blocks once a 
disk is added so that all local dirs are once again used equally. 

I don't think, detection of newly mounted directory, rebalancing of blocks, etc 
is cloud specific as all of this can be done using java's IO/NIO api.

This feature would however be mostly useful for users running in spark on 
cloud. Currently, the users are expected to guess their shuffle storage 
footprint and accordingly mount the right sized disks. If the guess is wrong, 
the job fails, wasting a lot of time.

> Dynamic disk allocation
> -----------------------
>
>                 Key: SPARK-18421
>                 URL: https://issues.apache.org/jira/browse/SPARK-18421
>             Project: Spark
>          Issue Type: New Feature
>          Components: Spark Core
>    Affects Versions: 2.0.1
>            Reporter: Aniket Bhatnagar
>            Priority: Minor
>
> Dynamic allocation feature allows you to add executors and scale computation 
> power. This is great, however, I feel like we also need a way to dynamically 
> scale storage. Currently, if the disk is not able to hold the spilled/shuffle 
> data, the job is aborted (in yarn, the node manager kills the container) 
> causing frustration and loss of time. In deployments like AWS EMR, it is 
> possible to run an agent that add disks on the fly if it sees that the disks 
> are running out of space and it would be great if Spark could immediately 
> start using the added disks just as it does when new executors are added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-18421) Dynamic disk allocation

Reply via email to