[jira] [Commented] (SPARK-24832) Improve inputMetrics's bytesRead update for ColumnarBatch

yucai (JIRA) Tue, 17 Jul 2018 03:40:38 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-24832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16546371#comment-16546371
 ]


yucai commented on SPARK-24832:
-------------------------------

Currently, ColumnarBatch's bytesRead need to be updated for every 4096 * 1000 
rows, which makes the metrics out of date.
Can we update it for each batch?

{code:java}
if (nextElement.isInstanceOf[ColumnarBatch]) {
inputMetrics.incRecordsRead(nextElement.asInstanceOf[ColumnarBatch].numRows())
} else {
inputMetrics.incRecordsRead(1)
}
if (inputMetrics.recordsRead % 
SparkHadoopUtil.UPDATE_INPUT_METRICS_INTERVAL_RECORDS == 0) {
updateBytesRead()
}
{code}

> Improve inputMetrics's bytesRead update for ColumnarBatch
> ---------------------------------------------------------
>
>                 Key: SPARK-24832
>                 URL: https://issues.apache.org/jira/browse/SPARK-24832
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core, SQL
>    Affects Versions: 2.3.1
>            Reporter: yucai
>            Priority: Major
>
> Improve inputMetrics's bytesRead update for ColumnarBatch



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-24832) Improve inputMetrics's bytesRead update for ColumnarBatch

Reply via email to