mustafasrepo commented on PR #9127: URL: https://github.com/apache/arrow-datafusion/pull/9127#issuecomment-1945936404
Thanks @Lordworms for this PR. I have examined this PR and it is really neat. However, I think we need to address a couple of points before merging: - During substitution you seem to be filtering out expressions unchanged. As an example, assume ordering is `[a ASC, b ASC]`. And projection expressions are `a, b, CAST(a as BIGINT) as cast_a`. In this case after projection valid orderings are `[a ASC, b ASC]` and `[cast_a ASC, b ASC]`. In this case, substitution should convert single ordering `[a ASC, b ASC]` possibly multiple orderings: `[a ASC, b ASC], [cast_a ASC, b ASC]` - Also substitution should work only in when function or operation has one-one-one mapping. e.g monotonicity alone is not a sufficient condition. As a counter example consider following table a | b -- | -- 1.5 | 1 1.5 | 2 2.0 | 1 2.0 | 2 where table satisfies ordering `[a ASC, b ASC]`. If projection the projection (a, b, CEIL(a)) is applied to this table. Result would be a | b | CEIL(a) | -- | -- | -- 1.5 | 1 | 2.0 1.5 | 2 | 2.0 2.0 | 1 | 2.0 2.0 | 2 | 2.0 where ordering `[a ASC, b ASC]` is still satisfied, but `[ceil(a) ASC, b ASC]` is not satisfied. Hence necessary condition for substitution is monotonicity and one-to-one mapping. For the same reason, `CAST` should substitute only when target type is larger than the source type. (e.g cast from `int16` to `int32`, etc.) Currently, we do not have `one-to-one` mapping analysis for Scalar functions. For this reason, until we have that support adding substitution support for `cast` to larger type is enough. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
