lidavidm commented on a change in pull request #10060:
URL: https://github.com/apache/arrow/pull/10060#discussion_r626571972
##########
File path: cpp/src/arrow/dataset/file_parquet.cc
##########
@@ -385,6 +385,23 @@ Result<ScanTaskIterator> ParquetFileFormat::ScanFile(
return MakeVectorIterator(std::move(tasks));
}
+util::optional<Future<int64_t>> ParquetFileFormat::CountRows(
+ const std::shared_ptr<FileFragment>& file, Expression predicate,
+ std::shared_ptr<ScanOptions> options) {
+ auto parquet_file =
internal::checked_pointer_cast<ParquetFileFragment>(file);
+ if (FieldsInExpression(predicate).size() > 0) {
Review comment:
Ah, it turns out it doesn't work because right now, we can only
definitively reject a row group, but can't decide whether a row group should
definitively be included in the row count, because `SimplifyWithGuarantee((i64
== 1), (i64 <= 1) & (i64 >= 1))` simplifies to `i64 == 1`, not to `true`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]