[
https://issues.apache.org/jira/browse/HADOOP-19085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17822345#comment-17822345
]
ASF GitHub Bot commented on HADOOP-19085:
-----------------------------------------
HanFreedom opened a new pull request, #6602:
URL: https://github.com/apache/hadoop/pull/6602
<!--
Thanks for sending a pull request!
1. If this is your first time, please read our contributor guidelines:
https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute
2. Make sure your PR title starts with JIRA issue id, e.g.,
'HADOOP-17799. Your PR title ...'.
-->
### Description of PR
A new hadoop-compat-bench module introducing a quick HCFS compatibility
assessment tool to Hadoop for FileSystem implementations, as described and
discussed in HADOOP-19085.
### How was this patch tested?
This is a new and standalone module module tested by its own unit tests.
### For code changes:
- [ ] Does the title or this PR starts with the corresponding JIRA issue id
(e.g. 'HADOOP-17799. Your PR title ...')?
- [ ] Object storage: have the integration tests been executed and the
endpoint declared according to the connector-specific documentation?
- [ ] If adding new dependencies to the code, are these dependencies
licensed in a way that is compatible for inclusion under [ASF
2.0](http://www.apache.org/legal/resolved.html#category-a)?
- [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`,
`NOTICE-binary` files?
> Compatibility Benchmark over HCFS Implementations
> -------------------------------------------------
>
> Key: HADOOP-19085
> URL: https://issues.apache.org/jira/browse/HADOOP-19085
> Project: Hadoop Common
> Issue Type: New Feature
> Reporter: Han Liu
> Assignee: Han Liu
> Priority: Major
> Labels: pull-request-available
> Attachments: HDFS Compatibility Benchmark Design.pdf
>
>
> {*}Background:{*}Hadoop-Compatible File System (HCFS) is a core conception in
> big data storage ecosystem, providing unified interfaces and generally clear
> semantics, and has become the de-factor standard for industry storage systems
> to follow and conform with. There have been a series of HCFS implementations
> in Hadoop, such as S3AFileSystem for Amazon's S3 Object Store, WASB for
> Microsoft's Azure Blob Storage and OSS connector for Alibaba Cloud Object
> Storage, and more from storage service's providers on their own.
> {*}Problems:{*}However, as indicated by introduction.md, there is no formal
> suite to do compatibility assessment of a file system for all such HCFS
> implementations. Thus, whether the functionality is well accomplished and
> meets the core compatible expectations mainly relies on service provider's
> own report. Meanwhile, Hadoop is also developing and new features are
> continuously contributing to HCFS interfaces for existing implementations to
> follow and update, in which case, Hadoop also needs a tool to quickly assess
> if these features are supported or not for a specific HCFS implementation.
> Besides, the known hadoop command line tool or hdfs shell is used to directly
> interact with a HCFS storage system, where most commands correspond to
> specific HCFS interfaces and work well. Still, there are cases that are
> complicated and may not work, like expunge command. To check such commands
> for an HCFS, we also need an approach to figure them out.
> {*}Proposal:{*}Accordingly, we propose to define a formal HCFS compatibility
> benchmark and provide corresponding tool to do the compatibility assessment
> for an HCFS storage system. The benchmark and tool should consider both HCFS
> interfaces and hdfs shell commands. Different scenarios require different
> kinds of compatibilities. For such consideration, we could define different
> suites in the benchmark.
> *Benefits:* We intend the benchmark and tool to be useful for both storage
> providers and storage users. For end users, it can be used to evalute the
> compatibility level and determine if the storage system in question is
> suitable for the required scenarios. For storage providers, it helps to
> quickly generate an objective and reliable report about core functioins of
> the storage service. As an instance, if the HCFS got a 100% on a suite named
> 'tpcds', it is demonstrated that all functions needed by a tpcds program have
> been well achieved. It is also a guide indicating how storage service
> abilities can map to HCFS interfaces, such as storage class on S3.
> Any thoughts? Comments and feedback are mostly welcomed. Thanks in advance.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]