[jira] [Commented] (SPARK-14963) YarnShuffleService should use YARN getRecoveryPath() for leveldb location

Thomas Graves (JIRA) Fri, 29 Apr 2016 05:59:21 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-14963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15264010#comment-15264010
 ]


Thomas Graves commented on SPARK-14963:
---------------------------------------

I'm definitely fine with it but someone else might also be looking at it under 
https://github.com/apache/spark/pull/12735

Here is some other information about this I found when I looked at it a bit 
more:

II think we can do it without reflection by defining our own setRecoveryPath 
function in YarnShuffleService (leave override off so it works with older 
versions of hadoop). Have it default to some invalid value and if its hadoop 
2.5 or greater it will get called and set it to a real value. Then in our code 
we could check to see if its set and if it is use it, if not we could fall back 
to currently implementation. Note that setRecoverPath is the only one we really 
need define since getRecoverPath is protected, but to be safe we might also 
implement that. We can store our own path.
The only other thing here is that we may want to handle upgrading. If you are 
currently running the shuffle service the ldb will be in local dirs but when 
you upgrade it will go to a new path and wouldn't find the old one. To handle 
this we could just look for it in the new path first and if not there look for 
it in the old locations and if found then move to the new location.

> YarnShuffleService should use YARN getRecoveryPath() for leveldb location
> -------------------------------------------------------------------------
>
>                 Key: SPARK-14963
>                 URL: https://issues.apache.org/jira/browse/SPARK-14963
>             Project: Spark
>          Issue Type: Improvement
>          Components: Shuffle, YARN
>    Affects Versions: 1.6.1
>            Reporter: Thomas Graves
>
> The YarnShuffleService, currently just picks a directly in the yarn local 
> dirs to store the leveldb file.  YARN added an interface in hadoop 2.5 
> getRecoverPath() to get the location where it should be storing this.
> We should change to use getRecoveryPath(). This does mean we will have to use 
> reflection or similar to check for its existence though since it doesn't 
> exist before hadoop 2.5



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-14963) YarnShuffleService should use YARN getRecoveryPath() for leveldb location

Reply via email to