thisisnic commented on issue #29896:
URL: https://github.com/apache/arrow/issues/29896#issuecomment-4862621485

   Still reproducible on main (3f4a04ee9f).
   
   The root cause is in the C++ dataset evolution strategy. 
[`BasicFragmentEvolution::DevolveFilter`](https://github.com/apache/arrow/blob/3f4a04ee9fb40a5db78cb5257b88950599c1c11a/cpp/src/arrow/dataset/dataset.cc#L392-L416)
 remaps field references from dataset schema to fragment schema, but doesn't 
handle type mismatches — so a filter literal typed to the dataset schema 
(string) gets compared against raw fragment data (int32). There's an [existing 
comment](https://github.com/apache/arrow/blob/3f4a04ee9fb40a5db78cb5257b88950599c1c11a/cpp/src/arrow/dataset/file_parquet.cc#L275-L276)
 acknowledging this limitation.
   
   The fix would be for `DevolveFilter` to insert a `cast` expression on the 
field reference when the dataset and fragment types differ, so that the filter 
kernel sees matching types.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to