[ 
https://issues.apache.org/jira/browse/HADOOP-11905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14537303#comment-14537303
 ] 

Kannan Rajah commented on HADOOP-11905:
---------------------------------------

[~cmccabe] Since the design and use case was fairly short and simple, I decided 
to put them in the bug description itself. If you feel that I need to provide 
further clarifications, I will be happy to compile all of them in a document 
and attach it.

Also, to reiterate, this abstraction is similar to the FileSystem abstraction. 
It let's individual distributions to tailor the implementation to fit their use 
cases without compromising the design/performance of general use case.

> Abstraction for LocalDirAllocator
> ---------------------------------
>
>                 Key: HADOOP-11905
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11905
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 2.5.2
>            Reporter: Kannan Rajah
>            Assignee: Kannan Rajah
>              Labels: BB2015-05-TBR
>             Fix For: 2.7.1
>
>         Attachments: 0001-Abstraction-for-local-disk-path-allocation.patch
>
>
> There are 2 abstractions used to write data to local disk.
> LocalDirAllocator: Allocate paths from a set of configured local directories.
> LocalFileSystem/RawLocalFileSystem: Read/write using java.io.* and java.nio.*
> In the current implementation, local disk is managed by guest OS and not 
> HDFS. The proposal is to provide a new abstraction that encapsulates the 
> above 2 abstractions and hides who manages the local disks. This enables us 
> to provide an alternate implementation where a DFS can manage the local disks 
> and it can be accessed using HDFS APIs. This means the DFS maintains a 
> namespace for node local directories and can create paths that are guaranteed 
> to be present on a specific node.
> Here is an example use case for Shuffle: When a mapper writes intermediate 
> data using this new implementation, it will continue write to local disk. 
> When a reducer needs to access data from a remote node, it can use HDFS APIs 
> with a path that points to that node’s local namespace instead of having to 
> use HTTP server to transfer the data across nodes.
> New Abstractions
> 1. LocalDiskPathAllocator
> Interface to get file/directory paths from the local disk namespace.
> This contains all the APIs that are currently supported by LocalDirAllocator. 
> So we just need to change LocalDirAllocator to implement this new interface.
> 2. LocalDiskUtil
> Helper class to get a handle to LocalDiskPathAllocator and the FileSystem
> that is used to manage those paths.
> By default, it will return LocalDirAllocator and LocalFileSystem.
> A supporting DFS can return DFSLocalDirAllocator and an instance of DFS.
> 3. DFSLocalDirAllocator
> This is a generic implementation. An allocator is created for a specific 
> node. It uses Configuration object to get user configured base directory and 
> appends the node hostname to it. Hence the returned paths are within the node 
> local namespace.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to