[jira] [Commented] (SPARK-3685) Spark's local dir should accept only local paths

Steve Loughran (JIRA) Tue, 17 Mar 2015 06:12:03 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-3685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14365087#comment-14365087
 ]


Steve Loughran commented on SPARK-3685:
---------------------------------------

YARN-1197 covers supporting resizing existing YARN containers: that would be 
the real solution to altering the memory footprint of an executor in a 
container -at least if that JVM can change its heap size down. but... that JIRA 
is "dormant"; I don't know if anyone is going to pick it up in the near-term.

SPARK-1529 looks at switching to the Hadoop FS APIs, but doesn't mandate remote 
storage: it just makes it possible

Switching to HDFS storage, as Andrew proposes, risks hitting network 
performance.
# network traffic unless the replication factor == 1. (though do that & there's 
only one preferred location for the new container)
# disk IO conflict with other HDFS work going on on the localhost. 
# the overhead of going via the TCP stack unless they are bypassed via unix 
domain sockets (as HBase does).

There's a risk, therefore, that the performance of all work will suffer just to 
support a single use case "flex executor container & JVM size". That's also 
ignoring the scheduling risk of the smaller container not being allocated 
resources

Hooking up the YARN NM shuffle would be the better way to do this. If that 
shuffle can't handle the wiring-up, it's probably easier to fix that than the 
whole YARN container-resize problem



> Spark's local dir should accept only local paths
> ------------------------------------------------
>
>                 Key: SPARK-3685
>                 URL: https://issues.apache.org/jira/browse/SPARK-3685
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core, YARN
>    Affects Versions: 1.1.0
>            Reporter: Andrew Or
>
> When you try to set local dirs to "hdfs:/tmp/foo" it doesn't work. What it 
> will try to do is create a folder called "hdfs:" and put "tmp" inside it. 
> This is because in Util#getOrCreateLocalRootDirs we use java.io.File instead 
> of Hadoop's file system to parse this path. We also need to resolve the path 
> appropriately.
> This may not have an urgent use case, but it fails silently and does what is 
> least expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-3685) Spark's local dir should accept only local paths

Reply via email to