[ 
https://issues.apache.org/jira/browse/HDDS-434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16631656#comment-16631656
 ] 

Steve Loughran commented on HDDS-434:
-------------------------------------

I can see the focus here is on upload, and why that makes sense short term. 
However, you have to consider whether enough of the S3A is being implemented 
for this, and what your long term goals are.

FWIW, for full uploads I'd recommend adding

* list objects
* MPU

To use S3A as the client, including for integration testing

Required: List objects, multipart upload, ranged GET
Preferred: bulk DELETE. 

h3. path and virtual host access mode

Most non-AWS S3 endpoints support only path access; avoids requiring DNS. AWS 
SDK makes it optional; {{fs.s3a.path.style.access}} turns it off on S3A. 
Recommend: postpone.

h3. Get Bucket

HADOOP-15409 proposes moving to doesBucketExistV2, which calls getBucketAcl(). 
That should be handled (I think it's just an extra ? param on Get Bucket)

h3. GET

* ranges must be supported
* even a range of 0-0 on a 0 byte file.
* Eventually, an Etag of some form; HADOOP-15625 wants this.

h3. MPU

That multipart upload is how S3ABlockOutputStream writes files of size > 
fs.s3a.block.size to S3. It's also used by multipart commit, parallel upload in 
{{copyFromLocalFile}}. I know its complex, but if it's left out, you can't test 
with S3AFilesystem.
 * committers use list multipart uploads too to clean things up, but that could 
probably just be an empty listing for now. 
* and list parts isn't used at all

h3. List objects

The (paged) list command is a ubiquitous call in S3A and other applications. 

There's a v1 vs v2 call; clients have to make a conscious decision about which 
to use. V2 is v1 with: fewer fields returned, variable page size supported. I 
don't know which applications will only work with v2 list delete.

h3. multiobject delete objects call

This is much more efficient for bulk deletion (e.g. parent directories), but 
can be turned off in S3A {{fs.s3a.multiobjectdelete.enable}}. I don't know 
which applications will only work with bulk delete.

h2. Testing and exit criteria

I'd make the S3A Filesystem the authoritative test tool for the project, it 
being built on the AWS SDK. There is a whole suite of tests in hadoop-aws which 
are designed to be pointed at S3 implementations other than AWS SDK, you just 
need to disable test for features you don't provide (encryption, Secure Token 
Service) & set up endpoints and signing.

If you choose to implement your own tests, you get to implement and maintain 
your own tests. While it can be justified for the basic HTTP verbs, they don't 
represent "in-the-field" uses of the APIs.


> Provide an s3 compatible REST api for ozone objects
> ---------------------------------------------------
>
>                 Key: HDDS-434
>                 URL: https://issues.apache.org/jira/browse/HDDS-434
>             Project: Hadoop Distributed Data Store
>          Issue Type: New Feature
>            Reporter: Elek, Marton
>            Assignee: Elek, Marton
>            Priority: Major
>         Attachments: S3Gateway.pdf
>
>
> S3 REST api is the de facto standard for object stores. Many external tools 
> already support it.
> This issue is about creating a new s3gateway component which implements (most 
> part of) the s3 API using the internal RPC calls.
> Some part of the implementation is very straightforward: we need a new 
> service with usual REST stack and we need to implement the most commont 
> GET/POST/PUT calls. Some other (Authorization, multi-part upload) are more 
> tricky.
> Here I suggest to create an evaluation: first we can implement a skeleton 
> service which could support read only requests without authorization and we 
> can define proper specification for the upload part / authorization during 
> the work.
> As of now the gatway service could be a new standalone application (eg. ozone 
> s3g start) later we can modify it to work as s DatanodePlugin similar to the 
> existing object store plugin. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to