[GitHub] [iceberg] wypoon commented on a diff in pull request #5720: API: Add rowsCount to ScanTask

GitBox Wed, 14 Sep 2022 07:42:49 -0700


wypoon commented on code in PR #5720:
URL: https://github.com/apache/iceberg/pull/5720#discussion_r970908373



##########
api/src/main/java/org/apache/iceberg/ContentScanTask.java:
##########
@@ -63,4 +63,10 @@
    * @return a residual expression to apply to rows from this scan
    */
   Expression residual();
+
+  @Override
+  default long estimatedRowsCount() {
+    double scannedFileFraction = ((double) length()) / 
file().fileSizeInBytes();
+    return (long) (scannedFileFraction * file().recordCount());

Review Comment:
   @aokolnychyi there is a bug in this code, which I know came from #4446. The 
`scannedFileFraction` will never be 1.0 for parquet and orc files even when 
we're scanning the whole file, because there is a split offset of a few bytes.
   I put up a fix in #5755. Can you please review that?
   Also, can we name the method `estimatedRowCount` instead, as that is more 
idiomatic?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] wypoon commented on a diff in pull request #5720: API: Add rowsCount to ScanTask

Reply via email to