[
https://issues.apache.org/jira/browse/HADOOP-17414?focusedWorklogId=534421&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-534421
]
ASF GitHub Bot logged work on HADOOP-17414:
-------------------------------------------
Author: ASF GitHub Bot
Created on: 11/Jan/21 18:44
Start Date: 11/Jan/21 18:44
Worklog Time Spent: 10m
Work Description: steveloughran commented on pull request #2530:
URL: https://github.com/apache/hadoop/pull/2530#issuecomment-758146786
Next PR to come in
* tries to address all review comments
* adds stats gathering
* adds almost all the AWS headers (everything but some of the encryption
stuff) as XAttrs to be listed.
Log of a test run on a newly created file on a bucket with SSE-S3 and
versioning.
```
2021-01-11 18:32:24,009 [JUnit-testXAttrFileCost] INFO impl.ITestXAttrCost
(ITestXAttrCost.java:lambda$logXAttrs$2(78)) - header.Cache-Control has
bytes[0] => ""
2021-01-11 18:32:24,009 [JUnit-testXAttrFileCost] INFO impl.ITestXAttrCost
(ITestXAttrCost.java:lambda$logXAttrs$2(78)) - header.Content-Disposition has
bytes[0] => ""
2021-01-11 18:32:24,009 [JUnit-testXAttrFileCost] INFO impl.ITestXAttrCost
(ITestXAttrCost.java:lambda$logXAttrs$2(78)) - header.Content-Encoding has
bytes[0] => ""
2021-01-11 18:32:24,009 [JUnit-testXAttrFileCost] INFO impl.ITestXAttrCost
(ITestXAttrCost.java:lambda$logXAttrs$2(78)) - header.Content-Language has
bytes[0] => ""
2021-01-11 18:32:24,010 [JUnit-testXAttrFileCost] INFO impl.ITestXAttrCost
(ITestXAttrCost.java:lambda$logXAttrs$2(78)) - header.Content-Length has
bytes[1] => "0"
2021-01-11 18:32:24,010 [JUnit-testXAttrFileCost] INFO impl.ITestXAttrCost
(ITestXAttrCost.java:lambda$logXAttrs$2(78)) - header.Content-MD5 has bytes[0]
=> ""
2021-01-11 18:32:24,010 [JUnit-testXAttrFileCost] INFO impl.ITestXAttrCost
(ITestXAttrCost.java:lambda$logXAttrs$2(78)) - header.Content-Range has
bytes[0] => ""
2021-01-11 18:32:24,011 [JUnit-testXAttrFileCost] INFO impl.ITestXAttrCost
(ITestXAttrCost.java:lambda$logXAttrs$2(78)) - header.Content-Type has
bytes[24] => "application/octet-stream"
2021-01-11 18:32:24,011 [JUnit-testXAttrFileCost] INFO impl.ITestXAttrCost
(ITestXAttrCost.java:lambda$logXAttrs$2(78)) - header.ETag has bytes[32] =>
"d41d8cd98f00b204e9800998ecf8427e"
2021-01-11 18:32:24,011 [JUnit-testXAttrFileCost] INFO impl.ITestXAttrCost
(ITestXAttrCost.java:lambda$logXAttrs$2(78)) - header.Last-Modified has
bytes[28] => "Mon Jan 11 18:32:24 GMT 2021"
2021-01-11 18:32:24,012 [JUnit-testXAttrFileCost] INFO impl.ITestXAttrCost
(ITestXAttrCost.java:lambda$logXAttrs$2(78)) - header.x-amz-archive-status has
bytes[0] => ""
2021-01-11 18:32:24,012 [JUnit-testXAttrFileCost] INFO impl.ITestXAttrCost
(ITestXAttrCost.java:lambda$logXAttrs$2(78)) -
header.x-amz-object-lock-legal-hold has bytes[0] => ""
2021-01-11 18:32:24,013 [JUnit-testXAttrFileCost] INFO impl.ITestXAttrCost
(ITestXAttrCost.java:lambda$logXAttrs$2(78)) - header.x-amz-object-lock-mode
has bytes[0] => ""
2021-01-11 18:32:24,014 [JUnit-testXAttrFileCost] INFO impl.ITestXAttrCost
(ITestXAttrCost.java:lambda$logXAttrs$2(78)) -
header.x-amz-object-lock-retain-until-date has bytes[0] => ""
2021-01-11 18:32:24,014 [JUnit-testXAttrFileCost] INFO impl.ITestXAttrCost
(ITestXAttrCost.java:lambda$logXAttrs$2(78)) - header.x-amz-replication-status
has bytes[0] => ""
2021-01-11 18:32:24,014 [JUnit-testXAttrFileCost] INFO impl.ITestXAttrCost
(ITestXAttrCost.java:lambda$logXAttrs$2(78)) -
header.x-amz-server-side-encryption has bytes[6] => "AES256"
2021-01-11 18:32:24,015 [JUnit-testXAttrFileCost] INFO impl.ITestXAttrCost
(ITestXAttrCost.java:lambda$logXAttrs$2(78)) - header.x-amz-storage-class has
bytes[0] => ""
2021-01-11 18:32:24,015 [JUnit-testXAttrFileCost] INFO impl.ITestXAttrCost
(ITestXAttrCost.java:lambda$logXAttrs$2(78)) - header.x-amz-version-id has
bytes[32] => "XeajHuYbsD1rO.Bh.6UKqnqVMCZvkWg1"
```
And for the curious, that of /
```
2021-01-11 18:32:21,207 [JUnit-testXAttrRoot] INFO impl.ITestXAttrCost
(ITestXAttrCost.java:lambda$logXAttrs$2(78)) - header.Cache-Control has
bytes[0] => ""
2021-01-11 18:32:21,208 [JUnit-testXAttrRoot] INFO impl.ITestXAttrCost
(ITestXAttrCost.java:lambda$logXAttrs$2(78)) - header.Content-Disposition has
bytes[0] => ""
2021-01-11 18:32:21,208 [JUnit-testXAttrRoot] INFO impl.ITestXAttrCost
(ITestXAttrCost.java:lambda$logXAttrs$2(78)) - header.Content-Encoding has
bytes[0] => ""
2021-01-11 18:32:21,208 [JUnit-testXAttrRoot] INFO impl.ITestXAttrCost
(ITestXAttrCost.java:lambda$logXAttrs$2(78)) - header.Content-Language has
bytes[0] => ""
2021-01-11 18:32:21,208 [JUnit-testXAttrRoot] INFO impl.ITestXAttrCost
(ITestXAttrCost.java:lambda$logXAttrs$2(78)) - header.Content-Length has
bytes[1] => "0"
2021-01-11 18:32:21,209 [JUnit-testXAttrRoot] INFO impl.ITestXAttrCost
(ITestXAttrCost.java:lambda$logXAttrs$2(78)) - header.Content-MD5 has bytes[0]
=> ""
2021-01-11 18:32:21,210 [JUnit-testXAttrRoot] INFO impl.ITestXAttrCost
(ITestXAttrCost.java:lambda$logXAttrs$2(78)) - header.Content-Range has
bytes[0] => ""
2021-01-11 18:32:21,210 [JUnit-testXAttrRoot] INFO impl.ITestXAttrCost
(ITestXAttrCost.java:lambda$logXAttrs$2(78)) - header.Content-Type has
bytes[15] => "application/xml"
2021-01-11 18:32:21,213 [JUnit-testXAttrRoot] INFO impl.ITestXAttrCost
(ITestXAttrCost.java:lambda$logXAttrs$2(78)) - header.ETag has bytes[0] => ""
2021-01-11 18:32:21,213 [JUnit-testXAttrRoot] INFO impl.ITestXAttrCost
(ITestXAttrCost.java:lambda$logXAttrs$2(78)) - header.Last-Modified has
bytes[0] => ""
2021-01-11 18:32:21,213 [JUnit-testXAttrRoot] INFO impl.ITestXAttrCost
(ITestXAttrCost.java:lambda$logXAttrs$2(78)) - header.x-amz-archive-status has
bytes[0] => ""
2021-01-11 18:32:21,213 [JUnit-testXAttrRoot] INFO impl.ITestXAttrCost
(ITestXAttrCost.java:lambda$logXAttrs$2(78)) -
header.x-amz-object-lock-legal-hold has bytes[0] => ""
2021-01-11 18:32:21,213 [JUnit-testXAttrRoot] INFO impl.ITestXAttrCost
(ITestXAttrCost.java:lambda$logXAttrs$2(78)) - header.x-amz-object-lock-mode
has bytes[0] => ""
2021-01-11 18:32:21,214 [JUnit-testXAttrRoot] INFO impl.ITestXAttrCost
(ITestXAttrCost.java:lambda$logXAttrs$2(78)) -
header.x-amz-object-lock-retain-until-date has bytes[0] => ""
2021-01-11 18:32:21,214 [JUnit-testXAttrRoot] INFO impl.ITestXAttrCost
(ITestXAttrCost.java:lambda$logXAttrs$2(78)) - header.x-amz-replication-status
has bytes[0] => ""
2021-01-11 18:32:21,214 [JUnit-testXAttrRoot] INFO impl.ITestXAttrCost
(ITestXAttrCost.java:lambda$logXAttrs$2(78)) -
header.x-amz-server-side-encryption has bytes[0] => ""
2021-01-11 18:32:21,215 [JUnit-testXAttrRoot] INFO impl.ITestXAttrCost
(ITestXAttrCost.java:lambda$logXAttrs$2(78)) - header.x-amz-storage-class has
bytes[0] => ""
2021-01-11 18:32:21,215 [JUnit-testXAttrRoot] INFO impl.ITestXAttrCost
(ITestXAttrCost.java:lambda$logXAttrs$2(78)) - header.x-amz-version-id has
bytes[0] => ""
```
And the stats of that test run
```
2021-01-11 18:32:26,484 [JUnit] INFO s3a.AbstractS3ATestBase
(AbstractS3ATestBase.java:dumpFileSystemIOStatistics(99)) - Aggregate
FileSystem Statistics counters=((action_executor_acquired=1)
(action_http_head_request=11)
(directories_created=2)
(directories_deleted=1)
(fake_directories_deleted=1)
(files_created=1)
(files_deleted=1)
(object_bulk_delete_request=2)
(object_delete_objects=3)
(object_delete_request=1)
(object_list_request=8)
(object_metadata_request=11)
(object_put_request=3)
(object_put_request_completed=3)
(op_create=1)
(op_delete=2)
(op_exists=2)
(op_get_file_status=2)
(op_list_files=1)
(op_mkdirs=2)
(op_xattr_get_map=2)
(op_xattr_get_named=1)
(op_xattr_list=2)
(stream_write_block_uploads=1));
gauges=();
minimums=((action_executor_acquired.min=0)
(action_http_head_request.min=16)
(object_bulk_delete_request.min=46)
(object_delete_request.min=37)
(object_list_request.min=33)
(op_xattr_get_map.min=17)
(op_xattr_get_named.min=36)
(op_xattr_list.min=18));
maximums=((action_executor_acquired.max=0)
(action_http_head_request.max=903)
(object_bulk_delete_request.max=97)
(object_delete_request.max=37)
(object_list_request.max=934)
(op_xattr_get_map.max=33)
(op_xattr_get_named.max=36)
(op_xattr_list.max=22));
means=((action_executor_acquired.mean=(samples=1, sum=0, mean=0.0000))
(action_http_head_request.mean=(samples=11, sum=1279, mean=116.2727))
(object_bulk_delete_request.mean=(samples=2, sum=143, mean=71.5000))
(object_delete_request.mean=(samples=1, sum=37, mean=37.0000))
(object_list_request.mean=(samples=8, sum=5094, mean=636.7500))
(op_xattr_get_map.mean=(samples=2, sum=50, mean=25.0000))
(op_xattr_get_named.mean=(samples=1, sum=36, mean=36.0000))
(op_xattr_list.mean=(samples=2, sum=40, mean=20.0000)));
```
@sunchao XAttr calls seem to take ~25 milliseconds over a long haul link. If
someone used getXAttr(name) one by one to list attributes, it'd be expensive.
If they did the getXAttrs to get multiple attributes then the performance is
better.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 534421)
Time Spent: 6h 40m (was: 6.5h)
> Magic committer files don't have the count of bytes written collected by spark
> ------------------------------------------------------------------------------
>
> Key: HADOOP-17414
> URL: https://issues.apache.org/jira/browse/HADOOP-17414
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 3.2.0
> Reporter: Steve Loughran
> Assignee: Steve Loughran
> Priority: Major
> Labels: pull-request-available
> Time Spent: 6h 40m
> Remaining Estimate: 0h
>
> The spark statistics tracking doesn't correctly assess the size of the
> uploaded files as it only calls getFileStatus on the zero byte objects -not
> the yet-to-manifest files. Which, given they don't exist yet, isn't easy to
> do.
> Solution:
> * Add getXAttr and listXAttr API calls to S3AFileSystem
> * Return all S3 object headers as XAttr attributes prefixed "header." That's
> custom and standard (e.g header.Content-Length).
> The setXAttr call isn't implemented, so for correctness the FS doesn't
> declare its support for the API in hasPathCapability().
> The magic commit file write sets the custom header
> set the length of the data final data in the header
> x-hadoop-s3a-magic-data-length in the marker file.
> A matching patch in Spark will look for the XAttr
> "header.x-hadoop-s3a-magic-data-length" when the file
> being probed for output data is zero byte long.
> As a result, the job tracking statistics will report the
> bytes written but yet to be manifest.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]