npawar commented on issue #3513:  DATETIMECONVERT udf does not work for 
customized timezone and bucket size > 1 day
URL: 
https://github.com/apache/incubator-pinot/issues/3513#issuecomment-440098316
 
 
   I was able to reproduce this issue. The root cause for this, is this switch 
case for DAYS in 
[BaseDateTimeTransformer](https://github.com/apache/incubator-pinot/blob/master/pinot-core/src/main/java/com/linkedin/pinot/core/operator/transform/transformer/datetime/BaseDateTimeTransformer.java).
 This is where the _dateTimeTruncate is created, which is needed for  
converting millis to sdf. This only happens in cases where final output is 
expected in SDF
   
   ```
   case DAYS:
           _dateTimeTruncate = (dateTime) -> _outputDateTimeFormatter.print(
             dateTime.withDayOfMonth(
               (dateTime.getDayOfMonth()/sz)*sz
             ).dayOfMonth().roundFloorCopy()
           );
           break;
   ```
   
   The problem is `dateTime.getDayOfMonth()` will return values in [1-31]. `sz` 
is the output granularity. If the granularity happens to be greater than the 
dayOfMonth, division will give us `0`.
   For example, in your case, outputGranularity=7 DAYS, and hence sz=7. So for 
days 1-6 this will fail.
   Which is also why it works when you try `1:DAYS` because there is no `0` 
generated in the division.
   
   The fix for this would depend on what the right output for this should be. 
The current logic will always create output based on the local boundary (for 
cases where output is in SDF)
   For example:
   Below is a list of UTC millis, starting 2018-09-19T00:00:00 to 
2018-10-09T00:00:00. The second column is the output of applying 
   `dateTimeConvert(millis, '1:MILLISECONDS:EPOCH', '1:DAYS:EPOCH', '7:DAYS')`
   The third column is the output of applying
   `dateTimeConvert(millis, '1:MILLISECONDS:EPOCH', 
'1:DAYS:SIMPLE_DATE_FORMAT:yyyyMMdd', '7:DAYS')`
   **They are not equivalent.**
   ```
   1537315200000L, 20180914, 17787
   1537401600000L, 20180914, 17794
   1537488000000L, 20180921, 17794
   1537574400000L, 20180921, 17794
   1537660800000L, 20180921, 17794
   1537747200000L, 20180921, 17794
   1537833600000L, 20180921, 17794
   1537920000000L, 20180921, 17794
   1538006400000L, 20180921, 17801
   1538092800000L, 20180928, 17801
   1538179200000L, 20180928, 17801
   1538265600000L, 20180928, 17801
   ```
   In the first case, bucketing is done starting from **epoch start**.
   In the second case, bucketing is done such that the **start of the month 
resets the start of the bucketing** (this is because of the use of dayOfMonth 
to set the day) .
   Not stating that this approach is right or wrong, just explaining how it is 
working at the moment.
   This is how it would work for other granularities as well (hours bucketing 
will reset at the day boundary, etc)
   
   The fixes for this could be:
   1) Preserve this behavior. In that case for days 1-7 of the month 
(2018-09-01 to 2018-09-07), we would get 1 bucket (2018-09-01), for days 8-14 
we would get bucket 2 (2018-09-08). 
   This would be the easiest fix, we would only need to make some changes to 
the division in that code block pasted above.
   2) Make the bucketing for SDF also work starting epoch. This would need some 
refactoring of how this bucketing is done for SDF cases. Not sure if anyone is 
using this UDF and depending on the particular behavior. (@subprotocol )
   3) Leave this as is for now. For your specific usecase, you can use 
`(millis, '1:MILLISECONDS:EPOCH', '1:DAYS:EPOCH', '7:DAYS')` and convert the 
millis later to what you need.
   
   @kishoreg 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to