berkaysynnada commented on issue #7474:
URL: 
https://github.com/apache/arrow-datafusion/issues/7474#issuecomment-1706177337

   > I'm afraid I am not familiar with how the range analysis framework is 
implemented, so I am quite possibly missing something, but I'm not really sure 
how intervals can reliably be used in range analysis.
   > 
   > In particular the basic identities below do **not** hold for intervals
   > 
   > ```
   > a < b <=> a - b < 0
   > a + b = c <=> a = c - b
   > ```
   > 
   > The only circumstances where these hold is for intervals that are 
durations, i.e. that are entirely encoded as a number of seconds or 
nanoseconds. If they contain a non-zero number of days, months or years, etc... 
they are no longer meaningful outside the context of a particular timestamp.
   > 
   > That being said if the range analysis framework is only interested in 
bounding computation, we could potentially just use the maximum values for each 
of the constituent parts, so 25 hours for a day, 31 days for a month, etc... 🤔
   
   Thanks for thinking about it. We also do not prefer to do range analysis on 
intervals, that's why I suggested converting to duration where conversions are 
possible. However, there may be joins having a filter predicate of 
"timestamp_column - 1 month". Currently, pruning of tables with that kind of 
filter does not work safely, and we want to fix this where possible.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to