[ 
https://issues.apache.org/jira/browse/HADOOP-19085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17821332#comment-17821332
 ] 

Han Liu commented on HADOOP-19085:
----------------------------------

The code has been updated to a new version and all issues listed in 
https://github.com/apache/hadoop/pull/6535#issuecomment-1933237018 should have 
been resolved. A new check process has been triggered but not completed yet.

 

Grateful for comments from [[email protected]] 
{quote} * I think it should be a hadoop one for more than hdfs{quote}
Good suggestion. And the issue has already been changed to a hadoop-common 
issue.
{quote} * hdfs/webhdfs work well as unit tests for the functionality
 * but can/should also target other stores, with s3a and abfs connectors key 
ones for me.{quote}
Yes. The cases introduced by the benchmark are designed as probes of the HCFS 
APIs, checking if they are implemented and the basic functions are OK, but 
could not be appropriate to take the responsibility of quality assurance. Thus, 
there should be some differences between benchmark cases and hdfs/webhdfs unit 
tests.

On the other hand, some of the hdfs/webhdfs unit tests are complex. Targeting 
them to stores other than HDFS is an essential job benefiting to the Hadoop 
ecosystem. Maybe we could create a new topic to discuss this change.
{quote}one thing with the contract tests is we need the ability to declare when 
a store doesn't quite meet expectations. s3a fs lets you create files under 
files if you try hard; some operations raise different exceptions, permissions 
may be different. so a design which allows for downgrading is critical
{quote}
Good idea. The contract tests are directly related to core functionalities of 
the most import FileSystem APIs. We can have a new issue for further discussion 
of the display of expectation violations.

> Compatibility Benchmark over HCFS Implementations
> -------------------------------------------------
>
>                 Key: HADOOP-19085
>                 URL: https://issues.apache.org/jira/browse/HADOOP-19085
>             Project: Hadoop Common
>          Issue Type: New Feature
>            Reporter: Han Liu
>            Assignee: Han Liu
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: HDFS Compatibility Benchmark Design.pdf
>
>
> {*}Background:{*}Hadoop-Compatible File System (HCFS) is a core conception in 
> big data storage ecosystem, providing unified interfaces and generally clear 
> semantics, and has become the de-factor standard for industry storage systems 
> to follow and conform with. There have been a series of HCFS implementations 
> in Hadoop, such as S3AFileSystem for Amazon's S3 Object Store, WASB for 
> Microsoft's Azure Blob Storage and OSS connector for Alibaba Cloud Object 
> Storage, and more from storage service's providers on their own.
> {*}Problems:{*}However, as indicated by introduction.md, there is no formal 
> suite to do compatibility assessment of a file system for all such HCFS 
> implementations. Thus, whether the functionality is well accomplished and 
> meets the core compatible expectations mainly relies on service provider's 
> own report. Meanwhile, Hadoop is also developing and new features are 
> continuously contributing to HCFS interfaces for existing implementations to 
> follow and update, in which case, Hadoop also needs a tool to quickly assess 
> if these features are supported or not for a specific HCFS implementation. 
> Besides, the known hadoop command line tool or hdfs shell is used to directly 
> interact with a HCFS storage system, where most commands correspond to 
> specific HCFS interfaces and work well. Still, there are cases that are 
> complicated and may not work, like expunge command. To check such commands 
> for an HCFS, we also need an approach to figure them out.
> {*}Proposal:{*}Accordingly, we propose to define a formal HCFS compatibility 
> benchmark and provide corresponding tool to do the compatibility assessment 
> for an HCFS storage system. The benchmark and tool should consider both HCFS 
> interfaces and hdfs shell commands. Different scenarios require different 
> kinds of compatibilities. For such consideration, we could define different 
> suites in the benchmark.
> *Benefits:* We intend the benchmark and tool to be useful for both storage 
> providers and storage users. For end users, it can be used to evalute the 
> compatibility level and determine if the storage system in question is 
> suitable for the required scenarios. For storage providers, it helps to 
> quickly generate an objective and reliable report about core functioins of 
> the storage service. As an instance, if the HCFS got a 100% on a suite named 
> 'tpcds', it is demonstrated that all functions needed by a tpcds program have 
> been well achieved. It is also a guide indicating how storage service 
> abilities can map to HCFS interfaces, such as storage class on S3.
> Any thoughts? Comments and feedback are mostly welcomed. Thanks in advance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to