[ 
https://issues.apache.org/jira/browse/HDFS-9806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15145688#comment-15145688
 ] 

Chris Douglas commented on HDFS-9806:
-------------------------------------

Obviously, we won't map all the semantics of HDFS to arbitrary storage systems. 
Demonstrating that the state reported by HDFS is durable, visibility data 
across systems is consistently ordered, etc. would require deeper coordination 
than is practical. So we make a few simplifying assumptions about the analytic 
workloads that are common in practice.

In the model this proposes, an HDFS replica may be "provided" by storage 
accessible through multiple datanodes. This is exposed to HDFS as a storage 
tier, as a peer to other, locally-managed storage media (e.g., SSD). A replica 
stored in provided media is served by a client of the backing store that 
maintains a mapping from block IDs to a corresponding identifier in the backing 
store. Examples include file regions and object identifiers. By participating 
as a storage tier, a client may use other features of HDFS (e.g., storage-level 
quotas, security) to manage the local media as a read/write cache of provided 
blocks. By using local media, we hope to not only improve performance for 
Apache Hadoop applications working with remote storage, but also to maintain 
HDFS semantics _by using HDFS_ as the storage platform.

This generalizes existing work (HDFS-5318) supporting “shared, read-only” data 
in the namespace in HDFS. Currently, every Datanode will report a (redundant) 
replica from shared, read-only storage. This preserves two assumptions in the 
Namenode, first that every block storage is attached to only one Datanode, and 
second that every replica location will be tracked and reported to the 
Namenode. The Datanode implementation was not contributed to Apache,  but 
presumably one would group replicas and report a subset from each Datanode to 
avoid reporting the entire cluster as a location for every shared block.

In contrast, provided storage does not produce reports of every block 
accessible through it. Each Datanode registers a consistent storage ID with the 
Namenode, which is configured to refresh both the block mappings and- where 
applicable- the corresponding namespace.

We're working on a design document, where we will expand on the motivation, use 
cases, and implementation changes we propose to make in HDFS.

> Allow HDFS block replicas to be provided by an external storage system
> ----------------------------------------------------------------------
>
>                 Key: HDFS-9806
>                 URL: https://issues.apache.org/jira/browse/HDFS-9806
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Chris Douglas
>
> In addition to heterogeneous media, many applications work with heterogeneous 
> storage systems. The guarantees and semantics provided by these systems are 
> often similar, but not identical to those of 
> [HDFS|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/filesystem/index.html].
>  Any client accessing multiple storage systems is responsible for reasoning 
> about each system independently, and must propagate/and renew credentials for 
> each store.
> Remote stores could be mounted under HDFS. Block locations could be mapped to 
> immutable file regions, opaque IDs, or other tokens that represent a 
> consistent view of the data. While correctness for arbitrary operations 
> requires careful coordination between stores, in practice we can provide 
> workable semantics with weaker guarantees.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to