[jira] [Work logged] (HADOOP-17414) Magic committer files don't have the count of bytes written collected by spark

ASF GitHub Bot (Jira) Mon, 11 Jan 2021 10:10:05 -0800


     [ 
https://issues.apache.org/jira/browse/HADOOP-17414?focusedWorklogId=534387&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-534387
 ]


ASF GitHub Bot logged work on HADOOP-17414:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 11/Jan/21 18:09
            Start Date: 11/Jan/21 18:09
    Worklog Time Spent: 10m 
      Work Description: steveloughran commented on a change in pull request 
#2530:
URL: https://github.com/apache/hadoop/pull/2530#discussion_r555242848



##########
File path: 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
##########
@@ -4103,56 +4111,7 @@ private ObjectMetadata 
cloneObjectMetadata(ObjectMetadata source) {
     // in future there are new attributes added to ObjectMetadata
     // that we do not explicitly call to set here
     ObjectMetadata ret = newObjectMetadata(source.getContentLength());
-
-    // Possibly null attributes
-    // Allowing nulls to pass breaks it during later use
-    if (source.getCacheControl() != null) {
-      ret.setCacheControl(source.getCacheControl());
-    }
-    if (source.getContentDisposition() != null) {
-      ret.setContentDisposition(source.getContentDisposition());
-    }
-    if (source.getContentEncoding() != null) {
-      ret.setContentEncoding(source.getContentEncoding());
-    }
-    if (source.getContentMD5() != null) {
-      ret.setContentMD5(source.getContentMD5());
-    }
-    if (source.getContentType() != null) {
-      ret.setContentType(source.getContentType());
-    }
-    if (source.getExpirationTime() != null) {
-      ret.setExpirationTime(source.getExpirationTime());
-    }
-    if (source.getExpirationTimeRuleId() != null) {
-      ret.setExpirationTimeRuleId(source.getExpirationTimeRuleId());
-    }
-    if (source.getHttpExpiresDate() != null) {
-      ret.setHttpExpiresDate(source.getHttpExpiresDate());
-    }
-    if (source.getLastModified() != null) {
-      ret.setLastModified(source.getLastModified());
-    }
-    if (source.getOngoingRestore() != null) {
-      ret.setOngoingRestore(source.getOngoingRestore());
-    }
-    if (source.getRestoreExpirationTime() != null) {
-      ret.setRestoreExpirationTime(source.getRestoreExpirationTime());
-    }
-    if (source.getSSEAlgorithm() != null) {
-      ret.setSSEAlgorithm(source.getSSEAlgorithm());
-    }
-    if (source.getSSECustomerAlgorithm() != null) {
-      ret.setSSECustomerAlgorithm(source.getSSECustomerAlgorithm());
-    }
-    if (source.getSSECustomerKeyMd5() != null) {
-      ret.setSSECustomerKeyMd5(source.getSSECustomerKeyMd5());
-    }
-
-    for (Map.Entry<String, String> e : source.getUserMetadata().entrySet()) {
-      ret.addUserMetadata(e.getKey(), e.getValue());
-    }
-    return ret;
+    return getHeaderProcessing().cloneObjectMetadata(source, ret);

Review comment:
       will do, especially as its a bit less brittle now

##########
File path: 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
##########
@@ -4382,6 +4341,37 @@ public EtagChecksum getFileChecksum(Path f, final long 
length)
     }
   }
 
+  /**
+   * Get header processing support.
+   * @return the header processing of this instance.
+   */
+  private HeaderProcessing getHeaderProcessing() {
+    return headerProcessing;
+  }
+
+  @Override
+  public byte[] getXAttr(final Path path, final String name)
+      throws IOException {
+    return getHeaderProcessing().getXAttr(path, name);
+  }
+
+  @Override
+  public Map<String, byte[]> getXAttrs(final Path path) throws IOException {
+    return getHeaderProcessing().getXAttrs(path);
+  }
+
+  @Override
+  public Map<String, byte[]> getXAttrs(final Path path,
+      final List<String> names)
+      throws IOException {
+    return getHeaderProcessing().getXAttrs(path, names);
+  }
+
+  @Override
+  public List<String> listXAttrs(final Path path) throws IOException {
+    return headerProcessing.listXAttrs(path);

Review comment:
       ok




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 534387)
    Time Spent: 5h 40m  (was: 5.5h)

> Magic committer files don't have the count of bytes written collected by spark
> ------------------------------------------------------------------------------
>
>                 Key: HADOOP-17414
>                 URL: https://issues.apache.org/jira/browse/HADOOP-17414
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.2.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> The spark statistics tracking doesn't correctly assess the size of the 
> uploaded files as it only calls getFileStatus on the zero byte objects -not 
> the yet-to-manifest files. Which, given they don't exist yet, isn't easy to 
> do.
> Solution: 
> * Add getXAttr and listXAttr API calls to S3AFileSystem
> * Return all S3 object headers as XAttr attributes prefixed "header." That's 
> custom and standard (e.g header.Content-Length).
> The setXAttr call isn't implemented, so for correctness the FS doesn't
> declare its support for the API in hasPathCapability().
> The magic commit file write sets the custom header 
> set the length of the data final data in the header
> x-hadoop-s3a-magic-data-length in the marker file.
> A matching patch in Spark will look for the XAttr
> "header.x-hadoop-s3a-magic-data-length" when the file
> being probed for output data is zero byte long. 
> As a result, the job tracking statistics will report the
> bytes written but yet to be manifest.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Work logged] (HADOOP-17414) Magic committer files don't have the count of bytes written collected by spark

Reply via email to