rouault opened a new issue, #41651:
URL: https://github.com/apache/arrow/issues/41651

   ### Describe the enhancement requested
   
   This enhancement request would be a continuation of the previous enhancement 
done in https://github.com/apache/arrow/pull/39065 to support nested fields 
where the nesting type is a struct.
   
   Here I would like to apply a predicate pushdown on the ``x`` subfield of a 
``list<element: struct<x: double not null, y: double not null>>``
   
   ```
   required group field_id=-1 schema {
     optional binary field_id=-1 id (String);
     optional group field_id=-1 geometry (List) {
       repeated group field_id=-1 list {
         optional group field_id=-1 element {
           required double field_id=-1 x;
           required double field_id=-1 y;
         }
       }
     }
   }
   ```
   
   When trying to apply the following expression as 
parquet::Dataset::ScanBuilder::Filter(),
   ```
   auto fieldRefX = arrow::FieldRef(arrow::FieldRef("geometry", "element"), 
"x");
   expression =cp::less_equal(cp::field_ref(arrow::FieldRef(fieldRefX)), 
cp::literal(m_sFilterEnvelope.MaxX))
   ```
   
   I get the following error: ``nested paths only supported for structs``
   
   (I tried to remove that check, but I then get the following error: 
``Function 'struct_field' has no kernel matching input types (list<element: 
struct<x: double not null, y: double not null>>)``)
   
   Beyond the technical difficulties in implementing that, I guess there's a 
potential ambiguity of what such filtering means. Would that mean that a row is 
selected if all corresponding entries in the list match the predicate, or if 
just one would. For my use case (spatial filtering directly applied on 
[GeoArrow struct/separated encoded geometry 
columns](https://geoarrow.org/format.html), for non-Point geometry types, in 
GeoParquet files), the later would be what I'm looking for.
   
   CC @jorisvandenbossche @paleolimbot 
   
   ### Component(s)
   
   C++, Parquet


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to