Steve Loughran created SPARK-33739:
--------------------------------------

             Summary: Magic committer files don't have the count of bytes 
written collected by spark
                 Key: SPARK-33739
                 URL: https://issues.apache.org/jira/browse/SPARK-33739
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 3.0.1
            Reporter: Steve Loughran


The spark statistics tracking doesn't correctly assess the size of the uploaded 
files as it only calls getFileStatus on the zero byte objects -not the 
yet-to-manifest files. Which, given they don't exist yet, isn't easy to do.

HADOOP-17414 will attach the final length as a custom header to the marker 
object, and implement getXAttr in the S3A FS to probe for it.

BasicWriteStatsTracker can probe for this custom Xattr if the size of the 
generated file is 0 bytes; if found and parseable use that as the declared 
length of the output.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to