Steve Loughran created HADOOP-18651:
---------------------------------------
Summary: Add "versions" tool to s3a command line entry point
Key: HADOOP-18651
URL: https://issues.apache.org/jira/browse/HADOOP-18651
Project: Hadoop Common
Issue Type: Sub-task
Components: fs/s3
Affects Versions: 3.3.9
Reporter: Steve Loughran
having just implemented some version command support in the cloudstore jar, I
can see benefit in actually implementing it in hadoop-aws module
https://github.com/steveloughran/cloudstore/blob/trunk/src/main/site/versioned-objects.md
https://github.com/steveloughran/cloudstore/blob/trunk/src/main/extra/org/apache/hadoop/fs/s3a/extra/)
this code
* uses v1 sdk by asking the s3a fs for it; this will break with the move to v2
sdk
* doesn't have any tests
* doesn't have any review, maintenance plan
* bypasses audit log/referrer header creation
we could just say "use the aws CLI", but there are some benefits in using the
s3a connector code
* support for s3a:// urls
* can use the s3a auth/signing chain (knox, etc)
* plus proxy, region settings etc.
* could integrate with other bits of the stack (e.g spark RDD to get at all
versions of objects)
* would be really useful to have a tool to purge all directory delete markers
down a path, to speed up listing on versioned buckets.
* gets bundled everywhere
For use by downstream code we would want to have a public/evolving API to
access operations, e.g.
# taking an S3AFileStatus for rename/purge/restore operations
# listing all versions of objects under a path within a given time range and
mapping to RemoteIterator.
# HADOOP-16387. S3A openFile() options to allow etag/version to be set
Core code straightforward (it takes exactly two days to write, *excluding
tests*), public API and tests more work.
note, we should also move the entry point to being "s3a" with "s3guard"
retained for compatibility)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]