[
https://issues.apache.org/jira/browse/HADOOP-15038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16276822#comment-16276822
]
Steve Loughran commented on HADOOP-15038:
-----------------------------------------
This is something we should bring up on the common-dev list.
# hadoop-cloud-core sounds nice
# I've been doing some work on cloudup locally (including a 2.7.x build); I'll
need to submit a new patch
# I've also been doing some other small patches factoring out common code from
stores (HADOOP-14943), where again, this stuff can be shared
Essentially: we've been copying and pasting stuff between versions, and it's
reached the limits of maintenance. See
[https://hortonworks.com/blog/history-apache-hadoops-support-amazon-s3/] for
the illustration of that pasting.
Things I'd like to see in there
* the core "mimic a filesystem" functions
* standard statistic collection & names
* retry logic of S3A.Invoker
* Support for marshalling login secrets as filesystem delegation tokens (see
HADOOP-14556). This is needed for users to submit their own credentials &
encryption keys to shared query engines
* Any CLI utility for listing/viewing things (see "hadoop s3guard
bucket-info"),
* Any CLI utility we can do for diagnostics. Your support team will love you
for this.
* Any more integration tests we can do beyond the basic abstract contract &
distcp tests. We now have a variant of the [Hadoop MR protocol test
suite|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/commit/AbstractITCommitProtocol.java]
and [MR
Job|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/commit/AbstractITCommitMRJob.java]
One thing to consider though, the Ozone store under HDFS could be looking at
some of this stuff too, ultimately. Which means that hadoop-common.jar is the
right place to put this stuff, at least until there's a compelling reason to
split it out. (Except: do that, things start to depend on it, and splitting it
becomes impossible....
Thoughts?
BTW, if you haven't noticed, I've got a module designed to do some integration
with and testing under Apache Spark:
https://github.com/hortonworks-spark/cloud-integration . At some point I hope
to submit to Apache Bahir, for now it's a bit too unstable.
> Abstract MetadataStore in S3Guard into a common module.
> -------------------------------------------------------
>
> Key: HADOOP-15038
> URL: https://issues.apache.org/jira/browse/HADOOP-15038
> Project: Hadoop Common
> Issue Type: New Feature
> Components: fs
> Affects Versions: 3.0.0-beta1
> Reporter: Genmao Yu
>
> Open this JIRA to discuss if we should move {{MetadataStore}} in {{S3Guard}}
> into a common module.
> Based on this work, other filesystem or object store can implement their own
> metastore for optimization (known issues like consistency problem and
> metadata operation performance). [[email protected]] and other guys have
> done many base and great works in {{S3Guard}}. It is very helpful to start
> work. I did some perf test in HADOOP-14098, and started related work for
> Aliyun OSS. Indeed there are still works to do for {{S3Guard}}, like
> metadata cache inconsistent with S3 and so on. It also will be a problem for
> other object store. However, we can do these works in parallel.
> [[email protected]] [~fabbri] [~drankye] Any suggestion is appreciated.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]