[ 
https://issues.apache.org/jira/browse/HDFS-9806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15322378#comment-15322378
 ] 

Ewan Higgs commented on HDFS-9806:
----------------------------------

Hi all. I worked with [~PieterReuse] and [~thodemoor] to get the PoC working vs 
S3 through s3a. It required a few minor changes which we can subimt as a PR to 
the Microsoft project on Github if you’re interested.

In using the proof of concept, we came up with more questions:

1. It makes sense to us for there to be a series of commands to attach, detach, 
and rescan provided storage from the command line. This way, administrators can 
attach provided storage to the system without having to restart any nodes. We 
propose {{hdfs providedstorage \[-mount,-unmount,-rescan\]}}. We could also 
name it {{attach}} and {{detatch}}.
 
{{hdfs providedstorage -c 
org.apache.hadoop.hdfs.server.common.S3AFileRegionFormat -mount s3a://bucket 
/s3a-bucket}}
 
This would end up using a lot of the concepts and code from fs2img but instead 
of dumping an fsimg to disk it would send a series of updates to the NN. 
{{-rescan}} would be similar but only needs the mount point as an argument. 
{{-unmount}} would also need only the mount point but would also require that 
the {{StoragePolicy}} was set to {{PROVIDED}} (e.g. no DISK or MEM replication 
to unwind).

Alternatively, we could just assume that a NN managing provided storage will be 
preoccupied with managing the provided store and would be federated with other 
NNs to mount the provided storage. This uses existing functionality to keep a 
separation of concerns (storage tiering vs. managing where remote storage 
systems are attached to the system).

2. On from @zhz’s questions, the {{PROVIDED}} blocks are not stored with the 
{{INodeFile}} (instead, the {{BlockManager}} inserts the locations in 
{{createBlockLocation}}), so it seems that anything trying to manage the 
replicas would need to have special logic to permit n-1 replicas on the system 
if one element of the storage policy is {{PROVIDED}}. This would require 
special logic just for the {{PROVIDED}} storage type which seems like a burden 
to maintain. Is this a correct understanding of the proposal? Is there a way to 
disentangle the block types so there doesn’t need to be special ‘if this is a 
PROVIDED block then do some special replication logic which no one in HDFS has 
ever done.’

3.  If we want to attach multiple provided storage locations within a single 
NN, would this mean multiple {{ProvidedStorageMap}} objects in the 
{{BlockManager}}, or do you think this should be a mapping in 
{{ProvidedStorageMap}} to multiple {{ProvidedStorageMap.BlockProvided}} subtype 
objects.

> Allow HDFS block replicas to be provided by an external storage system
> ----------------------------------------------------------------------
>
>                 Key: HDFS-9806
>                 URL: https://issues.apache.org/jira/browse/HDFS-9806
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Chris Douglas
>         Attachments: HDFS-9806-design.001.pdf
>
>
> In addition to heterogeneous media, many applications work with heterogeneous 
> storage systems. The guarantees and semantics provided by these systems are 
> often similar, but not identical to those of 
> [HDFS|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/filesystem/index.html].
>  Any client accessing multiple storage systems is responsible for reasoning 
> about each system independently, and must propagate/and renew credentials for 
> each store.
> Remote stores could be mounted under HDFS. Block locations could be mapped to 
> immutable file regions, opaque IDs, or other tokens that represent a 
> consistent view of the data. While correctness for arbitrary operations 
> requires careful coordination between stores, in practice we can provide 
> workable semantics with weaker guarantees.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to