[
https://issues.apache.org/jira/browse/HADOOP-11183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Thomas Demoor updated HADOOP-11183:
-----------------------------------
Attachment: HADOOP-11183-005.patch
Marked as unstable.
The underlying httpclient retries retriable errors and you can control it
though fs.s3a.attempts.maximum
Did some more investigation and the dominant time it takes for failure is the
dns failing to resolve. After that the subsequent parts fail fast in line with
what is set in fs.s3a.establish.timeout. So the fail-fast I had in mind (and
have implemented) seems premature optimization. Have been testing the current
code for some time so I think we shouldn't take the risk to put fail-fast in so
close to 2.7, I'll open up a separate jira for fail-fast.
Added site and core-default documentation. While passing by I corrected the
description of the connection timeouts: they are defined in milliseconds, not
seconds.
> Memory-based S3AOutputstream
> ----------------------------
>
> Key: HADOOP-11183
> URL: https://issues.apache.org/jira/browse/HADOOP-11183
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 2.6.0
> Reporter: Thomas Demoor
> Assignee: Thomas Demoor
> Attachments: HADOOP-11183-004.patch, HADOOP-11183-005.patch,
> HADOOP-11183.001.patch, HADOOP-11183.002.patch, HADOOP-11183.003.patch,
> design-comments.pdf
>
>
> Currently s3a buffers files on disk(s) before uploading. This JIRA
> investigates adding a memory-based upload implementation.
> The motivation is evidently performance: this would be beneficial for users
> with high network bandwidth to S3 (EC2?) or users that run Hadoop directly on
> an S3-compatible object store (FYI: my contributions are made in name of
> Amplidata).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)