du created SPARK-25237:
--------------------------
Summary: FileScanRdd's inputMetrics is wrong when select the
datasource table with limit
Key: SPARK-25237
URL: https://issues.apache.org/jira/browse/SPARK-25237
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 2.3.1, 2.2.2
Reporter: du
In FileScanRdd, we will update inputMetrics's bytesRead using updateBytesRead
every 1000 rows or when close the iterator.
but when close the iterator, we will invoke updateBytesReadWithFileSize to
increase the inputMetrics's bytesRead with file's length.
this will result in the inputMetrics's bytesRead is wrong when run the query
with limit such as select * from table limit 1。
because we do not support for Hadoop 2.5 and earlier now, we always get the
bytesRead from Hadoop FileSystem statistics other than files's length.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]