wypoon commented on code in PR #5720:
URL: https://github.com/apache/iceberg/pull/5720#discussion_r970908373
##########
api/src/main/java/org/apache/iceberg/ContentScanTask.java:
##########
@@ -63,4 +63,10 @@
* @return a residual expression to apply to rows from this scan
*/
Expression residual();
+
+ @Override
+ default long estimatedRowsCount() {
+ double scannedFileFraction = ((double) length()) /
file().fileSizeInBytes();
+ return (long) (scannedFileFraction * file().recordCount());
Review Comment:
@aokolnychyi there is a bug in this code, which I know came from #4446. The
`scannedFileFraction` will never be 1.0 for parquet and orc files even when
we're scanning the whole file, because there is a split offset of a few bytes.
I put up a fix in #5755. Can you please review that?
Also, can we name the method `estimatedRowCount` instead, as that is more
idiomatic?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]