weixiangsun commented on issue #7422:
URL: https://github.com/apache/pinot/issues/7422#issuecomment-1009346917


   Here is the document for Pre-Aggregation Gapfilling function design: 
https://docs.google.com/document/d/1FQbsTVywWLlmFaeUcYqDkb-XRWCpVhZe10BZyD2VzcU/edit#heading=h.brbishq9xbz1
   
   @siddharthteotia @jackjlli @amrishlal @lakshmanan-v and I had couple of 
rounds of design meetings. Thanks a lot for the effort!
   Here is the meeting minutes:
   1. We went through the design document and discussed the query design and 
implementation design. We reached the consensus that the implementation looks 
good. 
   2. We debated about the query design. We came out two feasible options:
   flat query option:
   SELECT 
   PREAGGREGATEGAPFILL(DATETIMECONVERT(event_time, '1:MILLISECONDS:EPOCH', 
'1:MILLISECONDS:EPOCH', '15:MINUTES'), 
'1:MILLISECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss.SSS', '2021-10-01 
09:00:00+00', '2021-10-01 12:00:00+00', '15:MINUTES', TIME_SERIES_ON(eventTime, 
lotId), FILTERING(is_occupied = 1), FILL(is_occupied,FILL_TYPE_PREVIOUS)) AS 
time_col,
   SUM(is_occupied) AS occupied_slots_count
   FROM parking_data
   WHERE event_time >= '10/01/2021 09:00:00+00' AND  event_time <= '10/01/2021 
12:00:00+00'
   GROUP BY time_col
   ORDER BY time_col
   
   and subquery option:
   SELECT time_col, SUM(is_occupied) AS occupied_slots_count
   FROM (
     SELECT PREAGGREGATEGAPFILL(DATETIMECONVERT(event_time, 
'1:MILLISECONDS:EPOCH', '1:MILLISECONDS:EPOCH', '15:MINUTES'), 
'1:MILLISECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss.SSS', '2021-10-01 
09:00:00+00', '2021-10-01 12:00:00+00', '15:MINUTES', TIME_SERIES_ON(eventTime, 
lotId), FILL(is_occupied,FILL_TYPE_PREVIOUS)) AS time_col, is_occupied
     FROM parking_data
     WHERE event_time >= '10/01/2021 09:00:00+00' AND  event_time <= 
'10/01/2021 12:00:00+00')
   WHERE isOccupied = 1
   GROUP BY time_col
   ORDER BY time_col
   
   As for the first option (flat query option), the PreAggregateGapFill 
function contains the hint about the pre-aggregate gapfilling. Everything about 
pre-aggregation is contained inside this function. But its problem is that the 
first selection expression will generate more rows. The aggregation selection 
expression will reduce the number of rows. This is counterintuitive. The 
execution order is not explicit even though the document can provide more 
details. Also for first option, we need define FILTERING inside the 
PreAggregateGapFill function in order to filter out the gapfilled result. The 
second option 
   
   As for the second option (the subquery option), it separates the gapfilling 
step and the aggregation step. The inner query logically defines the gapfilling 
step. The aggregation step is defined by the outer query. Its advantage is that 
the execution step is more explicit than the flat query option. Its problem is 
that the subquery feature is not in place. It might conflict with the future 
implementation of subquery. The solution is to make it specific in order to 
prevent the conflict. When subquery is in place, pre-aggregation gapfilling can 
migrated to leverage the subquery feature or be compatible with subquery 
feature. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to