mustafasrepo commented on PR #9127:
URL: 
https://github.com/apache/arrow-datafusion/pull/9127#issuecomment-1945936404

   
   Thanks @Lordworms for this PR. I have examined this PR and it is really 
neat. However,
   I think we need to address a couple of points before merging:
   
   - During substitution you seem to be filtering out expressions unchanged. As 
an example, assume ordering is `[a ASC, b ASC]`. And projection expressions are 
`a, b, CAST(a as BIGINT) as cast_a`. In this case after projection valid 
orderings are `[a ASC, b ASC]` and `[cast_a ASC, b ASC]`. In this case, 
substitution should convert single ordering `[a ASC, b ASC]` possibly multiple 
orderings: `[a ASC, b ASC], [cast_a ASC, b ASC]`
   - Also substitution should work only in when function or operation has 
one-one-one mapping. e.g monotonicity alone is not a sufficient condition. As a 
counter example consider following table
   
   a | b
   -- | --
   1.5 | 1
   1.5 | 2
   2.0 | 1
   2.0 | 2
   
   where table satisfies ordering `[a ASC, b ASC]`. If projection the 
projection (a, b, CEIL(a)) is applied to this table. Result would be 
   
   a | b | CEIL(a) |
   -- | -- | --
   1.5 | 1 | 2.0
   1.5 | 2 | 2.0
   2.0 | 1 | 2.0
   2.0 | 2 | 2.0
   
   where ordering `[a ASC, b ASC]` is still satisfied, but `[ceil(a) ASC, b 
ASC]` is not satisfied. Hence necessary condition for substitution is 
monotonicity and one-to-one mapping. For the same reason, `CAST` should 
substitute only when target type is larger than the source type. (e.g cast from 
`int16` to `int32`, etc.)
   
   Currently, we do not have `one-to-one` mapping analysis for Scalar 
functions. For this reason, until we have that support adding substitution 
support for `cast` to larger type is enough.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to