[jira] [Work logged] (HADOOP-17414) Magic committer files don't have the count of bytes written collected by spark

ASF GitHub Bot (Jira) Tue, 08 Dec 2020 02:35:04 -0800


     [ 
https://issues.apache.org/jira/browse/HADOOP-17414?focusedWorklogId=521619&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-521619
 ]


ASF GitHub Bot logged work on HADOOP-17414:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 08/Dec/20 10:34
            Start Date: 08/Dec/20 10:34
    Worklog Time Spent: 10m 
      Work Description: steveloughran commented on pull request #2530:
URL: https://github.com/apache/hadoop/pull/2530#issuecomment-740536282


   I have a more straightforward solution to this: S3A to implement the 
FileSystem.getXAttr API call to return the headers. No risk to other 
applications; all spark will need to do is check for the header before looking 
for file length, swallow any exceptions raised in the API call and fall back to 
getFileStatus. Less than 10 lines.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 521619)
    Time Spent: 50m  (was: 40m)

> Magic committer files don't have the count of bytes written collected by spark
> ------------------------------------------------------------------------------
>
>                 Key: HADOOP-17414
>                 URL: https://issues.apache.org/jira/browse/HADOOP-17414
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.2.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> The spark statistics tracking doesn't correctly assess the size of the 
> uploaded files as it only calls getFileStatus on the zero byte objects -not 
> the yet-to-manifest files.
> Everything works with the staging committer purely because it's measuring the 
> length of the files staged to the local FS, not the unmaterialized output.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Work logged] (HADOOP-17414) Magic committer files don't have the count of bytes written collected by spark

Reply via email to