[
https://issues.apache.org/jira/browse/SPARK-14963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15264010#comment-15264010
]
Thomas Graves commented on SPARK-14963:
---------------------------------------
I'm definitely fine with it but someone else might also be looking at it under
https://github.com/apache/spark/pull/12735
Here is some other information about this I found when I looked at it a bit
more:
II think we can do it without reflection by defining our own setRecoveryPath
function in YarnShuffleService (leave override off so it works with older
versions of hadoop). Have it default to some invalid value and if its hadoop
2.5 or greater it will get called and set it to a real value. Then in our code
we could check to see if its set and if it is use it, if not we could fall back
to currently implementation. Note that setRecoverPath is the only one we really
need define since getRecoverPath is protected, but to be safe we might also
implement that. We can store our own path.
The only other thing here is that we may want to handle upgrading. If you are
currently running the shuffle service the ldb will be in local dirs but when
you upgrade it will go to a new path and wouldn't find the old one. To handle
this we could just look for it in the new path first and if not there look for
it in the old locations and if found then move to the new location.
> YarnShuffleService should use YARN getRecoveryPath() for leveldb location
> -------------------------------------------------------------------------
>
> Key: SPARK-14963
> URL: https://issues.apache.org/jira/browse/SPARK-14963
> Project: Spark
> Issue Type: Improvement
> Components: Shuffle, YARN
> Affects Versions: 1.6.1
> Reporter: Thomas Graves
>
> The YarnShuffleService, currently just picks a directly in the yarn local
> dirs to store the leveldb file. YARN added an interface in hadoop 2.5
> getRecoverPath() to get the location where it should be storing this.
> We should change to use getRecoveryPath(). This does mean we will have to use
> reflection or similar to check for its existence though since it doesn't
> exist before hadoop 2.5
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]