[
https://issues.apache.org/jira/browse/HDFS-9809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15202333#comment-15202333
]
Virajith Jalaparti commented on HDFS-9809:
------------------------------------------
The motivation behind this JIRA is HDFS-9806 where data can be stored in remote
filesystems and datanodes will not hold the block data in local files. The
current implementation of the Datanode assumes that block data is always
located in {{java.io.File}} (e.g., {{FsVolumeSpi#getBasePath}}). This JIRA aims
to constrain this assumption to the classes that directly access/read/write the
block data ({{FsVolumeImpl}}, and {{ReplicaInfo}}). This will enable us to
minimize the changes to the datanode in HDFS-9806 – for example, checks that
the actual block data is not stored as a {{java.io.File}} but at a remote URI
can be constrained to within {{FsVolumeImpl}} and don’t have to added to parts
of the datanode which access or can potentially access
{{FsVolumeSpi#getBasePath}}.
Below we list the reasons behind the changes (in the patch submitted) to
different classes in the datanode.
h4. ReplicaInfo
The {{java.io.File}} related APIs in {{ReplicaInfo}} ({{getBlockFile}},
{{getMetaFile}}) are moved to a subclass of {{ReplicaInfo}} called
{{LocalReplica}}. The classes {{FinalizedReplica}}, {{ReplicaInPipeline}},
{{ReplicaUnderRecovery}}, and {{ReplicaWaitingToBeRecovered}} are changed to be
subclasses of {{LocalReplica}} instead of {{ReplicaInfo}}. The motivation
behind this change is that we can have {{ReplicaInfo}} s that point to blocks
located in remote stores and as a result don’t have associated {{java.io.File}}
s.
We added various functions to {{ReplicaInfo}} in order to replace the calls to
{{ReplicaInfo#getBlockFile}}, and {{ReplicaInfo#getMetaFile}} in the rest of
the code.
h4. FsVolumeSpi and StorageLocation
Instead of associating an FsVolume with a base path (which is a
{{java.io.File}}), we associate it with a {{StorageLocation}}. This allows us
to remove the dependence on {{java.io.File}} and replace it with the more
general one which can point to a {{java.io.File}} or an abstract {{URI}}
representing an external storage. Using {{StorageLocation}} instead of defining
a new type for location allows us to reuse its functionality and plug into the
rest of the code easily. Following this intuition, we replaced
{{FsVolumeSpi#getBasePath}} with {{FsVolumeSpi#getStorageLocation}}. As a
result, comparisons and references to FsVolumes which were done using the
{{java.io.File}} returned by {{FsVolumeSpi#getBasePath}} are now replaced by
comparisons and references to the {{StorageLocation}} returned by
{{FsVolumeSpi#getStorageLocation}}.
Extending this further, we attempted to make the following changes to the
Datanode: (a) associate {{StorageDirectory}} with {{StorageLocation}}, instead
of {{java.io.File}} (replacing calls to {{StorageDirectory#getRoot}} by
{{StorageDirectory#getStorageLocation}}) and (b) remove references to
{{StorageLocation#getFile}}.
h4. DirectoryScanner.ReportCompiler
The {{DirectoryScanner.ReportCompiler}} calls on
{{FsVolumeSpi#getFinalizedDir}} and compiles the report assuming that this
returns a {{java.io.File}}. However, in HDFS-9806, data may not be stored in
files. Further, the {{DirectoryScanner.ReportCompiler#compileReport}} function
assumes the way blocks are stored in FsVolumes which can be different for
different {{FsVolumeSpi}} implementations. To address these assumptions and to
allow the details of how volumes implement their storage, we moved the
{{ReportCompiler#compileReport}} function as one of those implemented by
{{FsVolumeSpi}}.
h4. FsDatasetImpl
Currently, functions in {{FsDatasetImpl}} that create new {{ReplicaInfo}}
objects (under different states RUR, Temporary, RBW etc. as part of the data
pipeline) all contain the assumption that blocks are associated with
java.io.Files. To remove this dependency, we moved these functions into
{{FsVolumeImpl}}. This provides the flexibility for the {{FsVolumeImpl}} to
handle {{ReplicaInfo}} s as it sees fit. In particular, if a certain
{{FsVolumeImpl}} uses external storage to store block data, it can perform
these functions appropriately.
> Abstract implementation-specific details from the datanode
> ----------------------------------------------------------
>
> Key: HDFS-9809
> URL: https://issues.apache.org/jira/browse/HDFS-9809
> Project: Hadoop HDFS
> Issue Type: Task
> Reporter: Virajith Jalaparti
>
> Multiple parts of the Datanode (FsVolumeSpi, ReplicaInfo, FSVolumeImpl etc.)
> implicitly assume that blocks are stored in java.io.File(s) and that volumes
> are divided into directories. We propose to abstract these details, which
> would help in supporting other storages.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)