[ 
https://issues.apache.org/jira/browse/HADOOP-15038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16276822#comment-16276822
 ] 

Steve Loughran commented on HADOOP-15038:
-----------------------------------------

This is something we should bring up on the common-dev list. 

# hadoop-cloud-core sounds nice
# I've been doing some work on cloudup locally (including a 2.7.x build); I'll 
need to submit a new patch
# I've also been doing some other small patches factoring out common code from 
stores (HADOOP-14943), where again, this stuff can be shared

Essentially: we've been copying and pasting stuff between versions, and it's 
reached the limits of maintenance. See 
[https://hortonworks.com/blog/history-apache-hadoops-support-amazon-s3/] for 
the illustration of that pasting.

Things I'd like to see in there

* the core "mimic a filesystem" functions
* standard statistic collection & names
* retry logic of S3A.Invoker
* Support for marshalling login secrets as filesystem delegation tokens (see 
HADOOP-14556). This is needed for users to submit their own credentials & 
encryption keys to shared query engines
* Any CLI utility for listing/viewing things (see "hadoop s3guard 
bucket-info"), 
* Any CLI utility we can do for diagnostics. Your support team will love you 
for this.
* Any more integration tests we can do beyond the basic abstract contract & 
distcp tests. We now have a variant of the [Hadoop MR protocol test 
suite|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/commit/AbstractITCommitProtocol.java]
  and [MR 
Job|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/commit/AbstractITCommitMRJob.java]
One thing to consider though, the Ozone store under HDFS could be looking at 
some of this stuff too, ultimately. Which means that hadoop-common.jar is the 
right place to put this stuff, at least until there's a compelling reason to 
split it out. (Except: do that, things start to depend on it, and splitting it 
becomes impossible....

Thoughts?


BTW, if you haven't noticed, I've got a module designed to do some integration 
with and testing under Apache Spark: 
https://github.com/hortonworks-spark/cloud-integration . At some point I hope 
to submit to Apache Bahir, for now it's a bit too unstable.



> Abstract MetadataStore in S3Guard into a common module.
> -------------------------------------------------------
>
>                 Key: HADOOP-15038
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15038
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs
>    Affects Versions: 3.0.0-beta1
>            Reporter: Genmao Yu
>
> Open this JIRA to discuss if we should move {{MetadataStore}} in {{S3Guard}} 
> into a common module. 
> Based on this work, other filesystem or object store can implement their own 
> metastore for optimization (known issues like consistency problem and 
> metadata operation performance). [[email protected]] and other guys have 
> done many base and great works in {{S3Guard}}. It is very helpful to start 
> work. I did some perf test in HADOOP-14098, and started related work for 
> Aliyun OSS.  Indeed there are still works to do for {{S3Guard}}, like 
> metadata cache inconsistent with S3 and so on. It also will be a problem for 
> other object store. However, we can do these works in parallel.
> [[email protected]] [~fabbri] [~drankye] Any suggestion is appreciated.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to