[
https://issues.apache.org/jira/browse/HDFS-9806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15145688#comment-15145688
]
Chris Douglas commented on HDFS-9806:
-------------------------------------
Obviously, we won't map all the semantics of HDFS to arbitrary storage systems.
Demonstrating that the state reported by HDFS is durable, visibility data
across systems is consistently ordered, etc. would require deeper coordination
than is practical. So we make a few simplifying assumptions about the analytic
workloads that are common in practice.
In the model this proposes, an HDFS replica may be "provided" by storage
accessible through multiple datanodes. This is exposed to HDFS as a storage
tier, as a peer to other, locally-managed storage media (e.g., SSD). A replica
stored in provided media is served by a client of the backing store that
maintains a mapping from block IDs to a corresponding identifier in the backing
store. Examples include file regions and object identifiers. By participating
as a storage tier, a client may use other features of HDFS (e.g., storage-level
quotas, security) to manage the local media as a read/write cache of provided
blocks. By using local media, we hope to not only improve performance for
Apache Hadoop applications working with remote storage, but also to maintain
HDFS semantics _by using HDFS_ as the storage platform.
This generalizes existing work (HDFS-5318) supporting “shared, read-only” data
in the namespace in HDFS. Currently, every Datanode will report a (redundant)
replica from shared, read-only storage. This preserves two assumptions in the
Namenode, first that every block storage is attached to only one Datanode, and
second that every replica location will be tracked and reported to the
Namenode. The Datanode implementation was not contributed to Apache, but
presumably one would group replicas and report a subset from each Datanode to
avoid reporting the entire cluster as a location for every shared block.
In contrast, provided storage does not produce reports of every block
accessible through it. Each Datanode registers a consistent storage ID with the
Namenode, which is configured to refresh both the block mappings and- where
applicable- the corresponding namespace.
We're working on a design document, where we will expand on the motivation, use
cases, and implementation changes we propose to make in HDFS.
> Allow HDFS block replicas to be provided by an external storage system
> ----------------------------------------------------------------------
>
> Key: HDFS-9806
> URL: https://issues.apache.org/jira/browse/HDFS-9806
> Project: Hadoop HDFS
> Issue Type: New Feature
> Reporter: Chris Douglas
>
> In addition to heterogeneous media, many applications work with heterogeneous
> storage systems. The guarantees and semantics provided by these systems are
> often similar, but not identical to those of
> [HDFS|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/filesystem/index.html].
> Any client accessing multiple storage systems is responsible for reasoning
> about each system independently, and must propagate/and renew credentials for
> each store.
> Remote stores could be mounted under HDFS. Block locations could be mapped to
> immutable file regions, opaque IDs, or other tokens that represent a
> consistent view of the data. While correctness for arbitrary operations
> requires careful coordination between stores, in practice we can provide
> workable semantics with weaker guarantees.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)