du created SPARK-25237:
--------------------------

             Summary: FileScanRdd's inputMetrics is wrong  when select the 
datasource table with limit
                 Key: SPARK-25237
                 URL: https://issues.apache.org/jira/browse/SPARK-25237
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.3.1, 2.2.2
            Reporter: du


In FileScanRdd, we will update inputMetrics's bytesRead using updateBytesRead  
every 1000 rows or when close the iterator.

but when close the iterator,  we will invoke updateBytesReadWithFileSize to 
increase the inputMetrics's bytesRead with file's length.

this will result in the inputMetrics's bytesRead is wrong when run the query 
with limit such as select * from table limit 1。

because we do not support for Hadoop 2.5 and earlier now, we always get the 
bytesRead from  Hadoop FileSystem statistics other than files's length.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to