mapleFU commented on PR #36814:
URL: https://github.com/apache/arrow/pull/36814#issuecomment-1646753000

   > In general, for row group filtering (using range predicate by min/max) 
sort order does not matter,
   only the correct min/max values are needed.
   
   Hmmm, for example, if you're using int32 and uint32, the encoded bytes might 
be the same, but the handling method would be different. As for string, it 
might has different collations. So just compare the binary usally doesn't get 
the right result. As the spec saying [1] "without column_orders, the meaning of 
the min_value and max_value fields in the Statistics object and the ColumnIndex 
object is undefined". So I guess this code can not checking.
   
   I think you can confirm with Photon contributor, and as for solving it, you 
can prune using thrift Statistics.
   
   [1] 
https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L1049


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to