[ 
https://issues.apache.org/jira/browse/SPARK-21882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

linxiaojun updated SPARK-21882:
-------------------------------
    Description: The first job called from saveAsHadoopDataset, running in each 
executor, does not calculate the writtenBytes of OutputMetrics correctly 
(writtenBytes is 0). The reason is that we did not initialize the callback 
function called to find bytes written in the right way. As usual, 
statisticsTable which records statistics in a FileSystem must be initialized at 
the beginning (this will be triggered when open SparkHadoopWriter). The 
solution for this issue is to adjust the order of callback function 
initialization.   (was: The first job called from saveAsHadoopDataset, running 
in each executor, does not calculate the writtenBytes of OutputMetrics 
correctly. The reason is that we did not initialize the callback function 
called to find bytes written in the right way. As usual, statisticsTable which 
records statistics in a FileSystem must be initialized at the beginning (this 
will be triggered when open SparkHadoopWriter). The solution for this issue is 
to adjust the order of callback function initialization. )

> OutputMetrics doesn't count written bytes correctly in the 
> saveAsHadoopDataset function
> ---------------------------------------------------------------------------------------
>
>                 Key: SPARK-21882
>                 URL: https://issues.apache.org/jira/browse/SPARK-21882
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.6.1, 2.2.0
>            Reporter: linxiaojun
>            Priority: Minor
>         Attachments: SPARK-21882.patch
>
>
> The first job called from saveAsHadoopDataset, running in each executor, does 
> not calculate the writtenBytes of OutputMetrics correctly (writtenBytes is 
> 0). The reason is that we did not initialize the callback function called to 
> find bytes written in the right way. As usual, statisticsTable which records 
> statistics in a FileSystem must be initialized at the beginning (this will be 
> triggered when open SparkHadoopWriter). The solution for this issue is to 
> adjust the order of callback function initialization. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to