bkietz commented on a change in pull request #8037:
URL: https://github.com/apache/arrow/pull/8037#discussion_r479471053



##########
File path: cpp/src/arrow/dataset/file_parquet.cc
##########
@@ -357,12 +357,14 @@ Result<ScanTaskIterator> 
ParquetFileFormat::ScanFile(std::shared_ptr<ScanOptions
                         GetReader(fragment->source(), options.get(), 
context.get()));
 
   if (!parquet_fragment->HasCompleteMetadata()) {
-    // row groups were not already filtered; do this now
-    RETURN_NOT_OK(parquet_fragment->EnsureCompleteMetadata(reader.get()));
-    ARROW_ASSIGN_OR_RAISE(row_groups,
-                          parquet_fragment->FilterRowGroups(*options->filter));
-    if (row_groups.empty()) {
-      return MakeEmptyIterator<std::shared_ptr<ScanTask>>();
+    // row groups were not already filtered; do this now (if there is a filter)
+    if (!options->filter->Equals(true)) {

Review comment:
       Simplifying the filter by the partition expression *would* work, but 
it's not safe without loading the physical schema since we need to validate the 
filter against that. Otherwise a filter without implicit casts (for example) 
would result in an assertion failure. Let's leave that optimization for a 
follow up; the right way to do this is probably to rewrite `Expression::Assume` 
to return a Result (allowing it to fail non-catastrophically even without a 
schema against which to validate), but that's a much larger project

##########
File path: cpp/src/arrow/dataset/file_parquet.cc
##########
@@ -357,12 +357,23 @@ Result<ScanTaskIterator> 
ParquetFileFormat::ScanFile(std::shared_ptr<ScanOptions
                         GetReader(fragment->source(), options.get(), 
context.get()));
 
   if (!parquet_fragment->HasCompleteMetadata()) {
-    // row groups were not already filtered; do this now
-    RETURN_NOT_OK(parquet_fragment->EnsureCompleteMetadata(reader.get()));
-    ARROW_ASSIGN_OR_RAISE(row_groups,
-                          parquet_fragment->FilterRowGroups(*options->filter));
-    if (row_groups.empty()) {
-      return MakeEmptyIterator<std::shared_ptr<ScanTask>>();
+    // row groups were not already filtered; do this now (if there is a filter)
+    if (!options->filter->Equals(true)) {
+      RETURN_NOT_OK(parquet_fragment->EnsureCompleteMetadata(reader.get()));
+      ARROW_ASSIGN_OR_RAISE(row_groups,
+                            
parquet_fragment->FilterRowGroups(*options->filter));
+      if (row_groups.empty()) {
+        return MakeEmptyIterator<std::shared_ptr<ScanTask>>();
+      }
+    } else {

Review comment:
       ```suggestion
       } else {
         // since we are not scanning this fragment with a filter, don't bother 
loading statistics
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to