[ https://issues.apache.org/jira/browse/SPARK-24832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16546371#comment-16546371 ]
yucai commented on SPARK-24832: ------------------------------- Currently, ColumnarBatch's bytesRead need to be updated for every 4096 * 1000 rows, which makes the metrics out of date. Can we update it for each batch? {code:java} if (nextElement.isInstanceOf[ColumnarBatch]) { inputMetrics.incRecordsRead(nextElement.asInstanceOf[ColumnarBatch].numRows()) } else { inputMetrics.incRecordsRead(1) } if (inputMetrics.recordsRead % SparkHadoopUtil.UPDATE_INPUT_METRICS_INTERVAL_RECORDS == 0) { updateBytesRead() } {code} > Improve inputMetrics's bytesRead update for ColumnarBatch > --------------------------------------------------------- > > Key: SPARK-24832 > URL: https://issues.apache.org/jira/browse/SPARK-24832 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL > Affects Versions: 2.3.1 > Reporter: yucai > Priority: Major > > Improve inputMetrics's bytesRead update for ColumnarBatch -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org