[
https://issues.apache.org/jira/browse/HADOOP-13560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Steve Loughran updated HADOOP-13560:
------------------------------------
Status: Patch Available (was: Open)
Commit fc16e03c; Patch 005. Moved all the operations in the block output stream
which directly interacted with the s3 client into a new inner class of
S3AFilesSystem, WriteOperationState. This cleanly separates interaction between
the output stream —buffering of data and queuing of uploads— from the upload
process itself. I think S3Guard may be able to do something with this, but I
also hope to use it as a start for async directory list/delete operations; this
class would track create-time probes, and initiate the async deletion of
directory objects after a successful write. That's why there are separate
callbacks for writeSuccessful and writeFailed...we will only want to spawn off
the deletion when the write succeeded.
In the process of coding all this, managed to break multipart uploads: this has
led to a clearer understanding of how part uploads fail, an improvement in
statistics collection and in the test.
Otherwise,
* trying to get the imports in sync with branch-2; IDE somehow rearranged
things.
* docs in more detail
* manual testing through all the FS operations
* locally switched all the s3a tests into using this (i.e. turned on block
output in auth-keys.xml)
I think this is ready for review and play. I'd recommend the disk block buffer
except in the special case that you know that you can upload data faster than
you can generate, and you wan't to bypass the disk. But I'd be curious about
performance numbers there, especially on distcp operations with s3a as the
destination
> S3ABlockOutputStream to support huge (many GB) file writes
> ----------------------------------------------------------
>
> Key: HADOOP-13560
> URL: https://issues.apache.org/jira/browse/HADOOP-13560
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 2.9.0
> Reporter: Steve Loughran
> Assignee: Steve Loughran
> Priority: Minor
> Attachments: HADOOP-13560-branch-2-001.patch,
> HADOOP-13560-branch-2-002.patch, HADOOP-13560-branch-2-003.patch,
> HADOOP-13560-branch-2-004.patch
>
>
> An AWS SDK [issue|https://github.com/aws/aws-sdk-java/issues/367] highlights
> that metadata isn't copied on large copies.
> 1. Add a test to do that large copy/rname and verify that the copy really
> works
> 2. Verify that metadata makes it over.
> Verifying large file rename is important on its own, as it is needed for very
> large commit operations for committers using rename
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]