berkaysynnada commented on issue #7474: URL: https://github.com/apache/arrow-datafusion/issues/7474#issuecomment-1706177337
> I'm afraid I am not familiar with how the range analysis framework is implemented, so I am quite possibly missing something, but I'm not really sure how intervals can reliably be used in range analysis. > > In particular the basic identities below do **not** hold for intervals > > ``` > a < b <=> a - b < 0 > a + b = c <=> a = c - b > ``` > > The only circumstances where these hold is for intervals that are durations, i.e. that are entirely encoded as a number of seconds or nanoseconds. If they contain a non-zero number of days, months or years, etc... they are no longer meaningful outside the context of a particular timestamp. > > That being said if the range analysis framework is only interested in bounding computation, we could potentially just use the maximum values for each of the constituent parts, so 25 hours for a day, 31 days for a month, etc... 🤔 Thanks for thinking about it. We also do not prefer to do range analysis on intervals, that's why I suggested converting to duration where conversions are possible. However, there may be joins having a filter predicate of "timestamp_column - 1 month". Currently, pruning of tables with that kind of filter does not work safely, and we want to fix this where possible. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
