YoungRX commented on issue #34313:
URL: https://github.com/apache/arrow/issues/34313#issuecomment-1442724440

   > I think id < 12345 actually scans two row groups, the correct result for 
CountRows should be 20000
   > Shouldn't the correct result be 12344?
   
   Yes, the end result of the scan should be 12344 rows. But as you said, the 
filter level for Parquet predicate pushdown is row group. For `id < 12345`, we 
actually scanned two row groups. `ParquetFileFormat::CountRows` calls 
`ParquetFileFragment::FilterRowGroups` to get the two row groups. These two row 
groups have 20000 rows, which is what `ParquetFileFormat::CountRows` should 
return.
   
   Specifically, `ParquetFileFormat::CountRows` calculates the number of rows 
of data after predicate pushdown filtering.
   
   Possible solutions for `ParquetFileFormat::CountRows` are as follows:
   
   > Delete `if (expressions[i] != compute::literal(true)) return 
util::nullopt;` from `ParquetFileFragment::TryCountRows`.
   
   And thanks for your answers. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to