[jira] [Updated] (HADOOP-13560) S3ABlockOutputStream to support huge (many GB) file writes

Steve Loughran (JIRA) Tue, 20 Sep 2016 12:19:46 -0700

     [ 
https://issues.apache.org/jira/browse/HADOOP-13560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Steve Loughran updated HADOOP-13560:
------------------------------------
    Attachment: HADOOP-13560-branch-2-003.patch

Patch 003

* (pooled) ByteBuffer now an option for buffering output, this should offer a 
in-memory performance with less risk of heap overflow. But it can still use 
enough memory that your Yarn hosted JVMs get killed; it's still only to be used 
with care
* replaced S3AFastOutputStream. The option is deprecated and downgraded to 
buffered + file.
* Pulled all fast output streams tests but a little one to verify that the 
options still work.
* I've not deleted the S3AFastOutputStream class —yet. It's there for comparing 
new vs. old
* javadocs in more places
* core-default.xml descriptions improved
* index.md updated with new values, more text
* tests pass down scale test maven options to sequential test runs.

Test endpoint: S3 ireland

I think this code is ready for review/testing by others. Can anyone doing this 
start with the documentation to see if it explains it, then go into the code. 
Ideally I'd like some testing of large distcps with the file buffering 
(verifies it scales) and the bytebuffer (to see how it fails, and add it to the 
troubleshooting docs)

> S3ABlockOutputStream to support huge (many GB) file writes
> ----------------------------------------------------------
>
>                 Key: HADOOP-13560
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13560
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 2.9.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Minor
>         Attachments: HADOOP-13560-branch-2-001.patch, 
> HADOOP-13560-branch-2-002.patch, HADOOP-13560-branch-2-003.patch
>
>
> An AWS SDK [issue|https://github.com/aws/aws-sdk-java/issues/367] highlights 
> that metadata isn't copied on large copies.
> 1. Add a test to do that large copy/rname and verify that the copy really 
> works
> 2. Verify that metadata makes it over.
> Verifying large file rename is important on its own, as it is needed for very 
> large commit operations for committers using rename



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (HADOOP-13560) S3ABlockOutputStream to support huge (many GB) file writes

Reply via email to