jayzhan211 commented on PR #14223: URL: https://github.com/apache/datafusion/pull/14223#issuecomment-2613874994
For the scalar case, like `SELECT a > -1`, consider any `SELECT a > b` where b is constant. I think we could optimize it since we know the value. If the scalar is negative, it can be simplified to `true`. Otherwise, we can rewrite the expression as `min(a, i64::max) > b OR (min(a, i64::max) = i64::max AND b = i64::max)`. However, for the column case, like `SELECT a > b`, we can only rewrite it to the latter form. I believe similar optimization rules exist for other comparison operators, given that the type implies the range of the values. Considering COALESCE and UNION, for the scalar case, we can rewrite the expression given the known value. But for the column case, I think Decimal128 is the only viable option I quite agree this should be handled in optimizer in general like `unwrap_cast_in_comparison` or physical optimizer rule that based on column statistics. Another question I would like to know is whether the u64+i64 combination is common in DataFusion? And whether we can avoid this at all. I guess u64 that is larger than i64::max is uncommon, can we aggressively use i64 even though we know it is always positive? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org