[ 
https://issues.apache.org/jira/browse/HADOOP-13028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-13028:
------------------------------------
    Attachment: HADOOP-13028-002.patch

HADOOP-13028 patch 002:
 -include fix for HADOOP-13058: "S3A FS fails during init against a read-only 
FS if multipart purge is enabled"
-include fix for HADOOP-13059 "S3a over-reacts to potentially transient network 
problems in its init() logic"
 -add more counters for # of read operations, readfully, readincomplete (i.e. 
when you got less back than you asked for)
 -the statistics counters for the new operations include some of the arguments; 
these are currently ignored, but would permit more detailed collection 
(histograms &c)
 -There's a test, {{TestS3AInputStreamPerformance}} which runs operations 
against a (configurable) large file; default is a public 20MB AWS landsat 
CSV.gz file. This file path can be changed to something else for testing 
against other infrastructures
 -FSDataInputStream extends its toString() call to include that of the wrapped 
stream; this allows those streams which add details to get pulled in.


> add counter and timer metrics for S3A HTTP & low-level operations
> -----------------------------------------------------------------
>
>                 Key: HADOOP-13028
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13028
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3, metrics
>    Affects Versions: 2.8.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Minor
>         Attachments: HADOOP-13028-001.patch, HADOOP-13028-002.patch
>
>
> against S3 (and other object stores), opening connections can be expensive, 
> closing connections may be expensive (a sign of a regression). 
> S3A FS and individual input streams should have counters of the # of 
> open/close/failure+reconnect operations, timers of how long things take. This 
> can be used downstream to measure efficiency of the code (how often 
> connections are being made), connection reliability, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to