jorisvandenbossche commented on code in PR #39065:
URL: https://github.com/apache/arrow/pull/39065#discussion_r1416924067


##########
cpp/src/arrow/dataset/file_parquet.cc:
##########
@@ -893,20 +902,29 @@ Result<std::vector<compute::Expression>> 
ParquetFileFragment::TestRowGroups(
     return std::vector<compute::Expression>{};
   }
 
+  const SchemaField* schema_field = nullptr;
   for (const FieldRef& ref : FieldsInExpression(predicate)) {
     ARROW_ASSIGN_OR_RAISE(auto match, ref.FindOneOrNone(*physical_schema_));
-
     if (match.empty()) continue;
-    if (statistics_expressions_complete_[match[0]]) continue;
-    statistics_expressions_complete_[match[0]] = true;
+    schema_field = &manifest_->schema_fields[match[0]];
+
+    for (size_t i = 1; i < match.indices().size(); ++i) {
+      if (schema_field->field->type()->id() != Type::STRUCT) {
+        return Status::Invalid("nested paths only supported for structs");
+      }

Review Comment:
   Yes, but we currently also don't support any predicate kernels for those 
data types at the moment AFAIK. 
   
   For example for a list column, you can't do something like "list_field > 1" 
because 1) such kernel isn't implemented, and 2) that actually also doesn't 
really make sense as a list scalar contains multiple values, so that doesn't 
evaluate to simple True/False, you need some kind of aggregation like 
"elementwise_all(list_field > 1)" (i.e. are "all" (or any) values in a list 
scalar larger than 1). 
   And even then simplifying such more complex expression based on the parquet 
statistics would also need to be implemented.
   
   (I would like to see this work at some point, but that's certainly future 
work)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to