[ 
https://issues.apache.org/jira/browse/SPARK-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13975914#comment-13975914
 ] 

Cheng Lian commented on SPARK-1529:
-----------------------------------

After some investigation, I came to the conclusion that, unlike adding Tachyon 
support, to allow setting {{spark.local.dir}} to a Hadoop FS location, instead 
of adding something like {{HDFSBlockManager}} / {{HDFSStore}}, we have to 
refactor related local FS access code to leverage HDFS interfaces. And it seems 
hard to make this change incremental. Besides writing shuffle map output, at 
least two places reference {{spark.local.dir}}:

# HTTP broadcasting uses {{spark.local.dir}} as resource root, and access local 
FS with `java.io.File`
# {{FileServerHandler}} accesses {{spark.local.dir}} via {{DiskBlockManager}} 
and reads local file with {{FileSegment}} and {{java.io.File}}

Adding new block manager / store for HDFS can't fix these places. I'm currently 
working on this issue by:

# Refactoring {{FileSegment.file}} from {{java.io.File}} to 
{{org.apache.hadoop.fs.Path}},
# Refactoring {{DiskBlockManager}}, {{DiskStore}}, {{HttpBroadcast}} & 
{{FileServerHandler}}  to leverage HDFS interfaces.

Please leave comments if I missed anything or there's simpler ways to 
workaround this.

(PS: We should definitely refactor block manager related code to reduce 
duplicate code and encapsulate more details. Maybe the public interface of 
block manager should only communicate with other component with block IDs and 
storage levels.)

> Support setting spark.local.dirs to a hadoop FileSystem 
> --------------------------------------------------------
>
>                 Key: SPARK-1529
>                 URL: https://issues.apache.org/jira/browse/SPARK-1529
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>            Reporter: Patrick Wendell
>            Assignee: Cheng Lian
>             Fix For: 1.1.0
>
>
> In some environments, like with MapR, local volumes are accessed through the 
> Hadoop filesystem interface. We should allow setting spark.local.dir to a 
> Hadoop filesystem location. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to