[ 
https://issues.apache.org/jira/browse/HDFS-9809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15202333#comment-15202333
 ] 

Virajith Jalaparti commented on HDFS-9809:
------------------------------------------

The motivation behind this JIRA is HDFS-9806 where data can be stored in remote 
filesystems and datanodes will not hold the block data in local files. The 
current implementation of the Datanode assumes that block data is always 
located in {{java.io.File}} (e.g., {{FsVolumeSpi#getBasePath}}). This JIRA aims 
to constrain this assumption to the classes that directly access/read/write the 
block data ({{FsVolumeImpl}}, and {{ReplicaInfo}}). This will enable us to 
minimize the changes to the datanode in HDFS-9806 – for example, checks that 
the actual block data is not stored as a {{java.io.File}} but at a remote URI 
can be constrained to within {{FsVolumeImpl}} and don’t have to added to parts 
of the datanode which access or can potentially access 
{{FsVolumeSpi#getBasePath}}. 
Below we list the reasons behind the changes (in the patch submitted) to 
different classes in the datanode. 

h4. ReplicaInfo
The {{java.io.File}} related APIs in {{ReplicaInfo}} ({{getBlockFile}}, 
{{getMetaFile}}) are moved to a subclass of {{ReplicaInfo}} called 
{{LocalReplica}}. The classes {{FinalizedReplica}}, {{ReplicaInPipeline}}, 
{{ReplicaUnderRecovery}}, and {{ReplicaWaitingToBeRecovered}} are changed to be 
subclasses of {{LocalReplica}} instead of {{ReplicaInfo}}. The motivation 
behind this change is that we can have {{ReplicaInfo}} s that point to blocks 
located in remote stores and as a result don’t have associated {{java.io.File}} 
s. 
We added various functions to {{ReplicaInfo}} in order to replace the calls to 
{{ReplicaInfo#getBlockFile}}, and {{ReplicaInfo#getMetaFile}} in the rest of 
the code. 

h4. FsVolumeSpi and StorageLocation
Instead of associating an FsVolume with a base path (which is a 
{{java.io.File}}), we associate it with a {{StorageLocation}}. This allows us 
to remove the dependence on {{java.io.File}} and replace it with the more 
general one which can point to a {{java.io.File}} or an abstract {{URI}} 
representing an external storage. Using {{StorageLocation}} instead of defining 
a new type for location allows us to reuse its functionality and plug into the 
rest of the code easily. Following this intuition, we replaced 
{{FsVolumeSpi#getBasePath}} with {{FsVolumeSpi#getStorageLocation}}. As a 
result, comparisons and references to FsVolumes which were done using the 
{{java.io.File}} returned by {{FsVolumeSpi#getBasePath}} are now replaced by 
comparisons and references to the {{StorageLocation}} returned by 
{{FsVolumeSpi#getStorageLocation}}. 

Extending this further, we attempted to make the following changes to the 
Datanode: (a) associate {{StorageDirectory}} with {{StorageLocation}}, instead 
of {{java.io.File}} (replacing calls to {{StorageDirectory#getRoot}} by 
{{StorageDirectory#getStorageLocation}}) and (b) remove references to 
{{StorageLocation#getFile}}. 

h4. DirectoryScanner.ReportCompiler
The {{DirectoryScanner.ReportCompiler}} calls on 
{{FsVolumeSpi#getFinalizedDir}} and compiles the report assuming that this 
returns a {{java.io.File}}. However, in HDFS-9806, data may not be stored in 
files. Further, the {{DirectoryScanner.ReportCompiler#compileReport}} function 
assumes the way blocks are stored in FsVolumes which can be different for 
different {{FsVolumeSpi}} implementations. To address these assumptions and to 
allow the details of how volumes implement their storage, we moved the 
{{ReportCompiler#compileReport}} function as one of those implemented by 
{{FsVolumeSpi}}. 

h4. FsDatasetImpl
Currently, functions in {{FsDatasetImpl}} that create new {{ReplicaInfo}} 
objects (under different states RUR, Temporary, RBW etc. as part of the data 
pipeline) all contain the assumption that blocks are associated with 
java.io.Files. To remove this dependency, we moved these functions into 
{{FsVolumeImpl}}. This provides the flexibility for the {{FsVolumeImpl}} to 
handle {{ReplicaInfo}} s as it sees fit. In particular, if a certain 
{{FsVolumeImpl}} uses external storage to store block data, it can perform 
these functions appropriately. 

> Abstract implementation-specific details from the datanode
> ----------------------------------------------------------
>
>                 Key: HDFS-9809
>                 URL: https://issues.apache.org/jira/browse/HDFS-9809
>             Project: Hadoop HDFS
>          Issue Type: Task
>            Reporter: Virajith Jalaparti
>
> Multiple parts of the Datanode (FsVolumeSpi, ReplicaInfo, FSVolumeImpl etc.) 
> implicitly assume that blocks are stored in java.io.File(s) and that volumes 
> are divided into directories. We propose to abstract these details, which 
> would help in supporting other storages. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to