[
https://issues.apache.org/jira/browse/SPARK-24832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16546371#comment-16546371
]
yucai commented on SPARK-24832:
-------------------------------
Currently, ColumnarBatch's bytesRead need to be updated for every 4096 * 1000
rows, which makes the metrics out of date.
Can we update it for each batch?
{code:java}
if (nextElement.isInstanceOf[ColumnarBatch]) {
inputMetrics.incRecordsRead(nextElement.asInstanceOf[ColumnarBatch].numRows())
} else {
inputMetrics.incRecordsRead(1)
}
if (inputMetrics.recordsRead %
SparkHadoopUtil.UPDATE_INPUT_METRICS_INTERVAL_RECORDS == 0) {
updateBytesRead()
}
{code}
> Improve inputMetrics's bytesRead update for ColumnarBatch
> ---------------------------------------------------------
>
> Key: SPARK-24832
> URL: https://issues.apache.org/jira/browse/SPARK-24832
> Project: Spark
> Issue Type: Bug
> Components: Spark Core, SQL
> Affects Versions: 2.3.1
> Reporter: yucai
> Priority: Major
>
> Improve inputMetrics's bytesRead update for ColumnarBatch
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]