Jackie-Jiang opened a new pull request #4156: Refactor HelixExternalViewBasedTimeBoundaryService to support all time units URL: https://github.com/apache/incubator-pinot/pull/4156 Currently we pick the segment end time as the time boundary, and append filter 'timeColumn < boundary' to offline table and filter 'timeColumn >= boundary' to realtime table to achieve the hybrid table federation. The problem with this is that, if the time unit is not DAYS (for example, MILLISECONDS), and the offline table has multiple daily segments to push, then we might get incomplete result before all offline segments are pushed. The solution is: always use (end time - 1 DAY) as the time boundary, append filter 'timeColumn <= boundary' to offline table and 'timeColumn > boundary' to realtime table. This can ensure all daily pushed segments or hourly pushed segments be covered regardless of the time unit. Also, we should use the time spec in schema as the source of truth for time column because data is generated based on the schema. In the future we might remove the timeColumnName and timeType fields from SegmentsValidationAndRetentionConfig.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
