NGA-TRAN commented on PR #5982:
URL: 
https://github.com/apache/arrow-datafusion/pull/5982#issuecomment-1511640259

   @waitingkuo 
   
   You have great tests. I agree they look inconsistent. I think we may want to 
define the behavior here and see whether we want to keep it or not:
   
   I think this one is easy to understand: the origin is Jan 31 so any other 
month will always be the last day of that month (28, 29, 30, 31 depending on  
the month and year)
   
   ```SQL
   ❯ select date_bin('1 month', timestamp '2000-02-29T00:00:00', timestamp 
'2000-01-31T00:00:00');
   
+----------------------------------------------------------------------------------+
   | datebin(Utf8("1 
month"),Utf8("2000-02-29T00:00:00"),Utf8("2000-01-31T00:00:00")) |
   
+----------------------------------------------------------------------------------+
   | 2000-02-29T00:00:00                                                        
      |
   
+----------------------------------------------------------------------------------+
   1 row in set. Query took 0.003 seconds.
   ```
   
   This is is tricky and we need to identify the behavior.  
   
   ```SQL
   ❯ select date_bin('1 month', timestamp '2000-01-31T00:00:00', timestamp 
'2000-02-29T00:00:00');
   
+----------------------------------------------------------------------------------+
   | datebin(Utf8("1 
month"),Utf8("2000-01-31T00:00:00"),Utf8("2000-02-29T00:00:00")) |
   
+----------------------------------------------------------------------------------+
   | 2000-01-29T00:00:00                                                        
      |
   
+----------------------------------------------------------------------------------+
   1 row in set. Query took 0.003 seconds.
   ```
   
   For February, 29 (or 28) is the last day of the month, but for other month 
is it not the last day of the month. So chromo's 
[Months](https://pola-rs.github.io/polars/chrono/struct.Months.html) that we 
are using in this PR do not consider 29 as the last day of the month and just 
simple add/minus 30 days which I think makes sense.
   
   ```SQL
   ❯ select date_bin('1 month', timestamp '2000-01-31T00:00:00', timestamp 
'2000-03-29T00:00:00');
   
+----------------------------------------------------------------------------------+
   | datebin(Utf8("1 
month"),Utf8("2000-01-31T00:00:00"),Utf8("2000-03-29T00:00:00")) |
   
+----------------------------------------------------------------------------------+
   | 2000-01-29T00:00:00                                                        
      |
   
+----------------------------------------------------------------------------------+
   1 row in set. Query took 0.005 seconds.
   ```
   
   So I think this is the behavior is acceptable. What do you, @alamb and 
@stuartcarnie think?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to