Jackie-Jiang opened a new pull request #4156: Refactor 
HelixExternalViewBasedTimeBoundaryService to support all time units
URL: https://github.com/apache/incubator-pinot/pull/4156
 
 
   Currently we pick the segment end time as the time boundary, and
   append filter 'timeColumn < boundary' to offline table and filter
   'timeColumn >= boundary' to realtime table to achieve the hybrid
   table federation. The problem with this is that, if the time unit
   is not DAYS (for example, MILLISECONDS), and the offline table has
   multiple daily segments to push, then we might get incomplete
   result before all offline segments are pushed.
   
   The solution is: always use (end time - 1 DAY) as the time
   boundary, append filter 'timeColumn <= boundary' to offline table
   and 'timeColumn > boundary' to realtime table. This can ensure
   all daily pushed segments or hourly pushed segments be covered
   regardless of the time unit.
   
   Also, we should use the time spec in schema as the source of truth
   for time column because data is generated based on the schema. In
   the future we might remove the timeColumnName and timeType fields
   from SegmentsValidationAndRetentionConfig.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to