shauryachats opened a new pull request, #15096:
URL: https://github.com/apache/pinot/pull/15096
The current GAPFILL implementation expects the time column series to always
be the first column in any subquery executed before the GAPFILL operation,
which is not an ideal behaviour since it breaks perfectly valid SQLs like:
```
SELECT
occupied,
GapFill(time_col, '1:MILLISECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd
HH:mm:ss.SSS',
'2021-11-07 4:00:00.000',
'2021-11-07 12:00:00.000', '1:HOURS',
FILL(occupied, 'FILL_PREVIOUS_VALUE'),
TIMESERIESON(alias_levelId, alias_lotId)),
alias_lotId,
alias_levelId
FROM
(
SELECT
lastWithTime(isOccupied, eventTime, 'INT') as occupied,
cast(lotId as varchar) as alias_lotId,
cast(levelId as varchar) as alias_levelId,
DATETIMECONVERT(eventTime,'1:MILLISECONDS:EPOCH',
'1:MILLISECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss.SSS',
'1:HOURS') AS time_col
FROM
parkingData
WHERE
eventTime >= 1636257600000 AND eventTime <= 1636286400000
GROUP BY
alias_levelId, time_col,alias_lotId
LIMIT 200
)
LIMIT 200
```
The current implementation of `findTimeBucketColumnIndex` simply looks for
the GAPFILL expression in the relevant subquery and operates on the assumption
that the time bucket column is positioned in the same index as GAPFILL when
GAPFILL is stripped while execution. This assumption only works for queries
where GAPFILL is present in the lowest subquery and is the first to be
executed. It fails for queries where a subquery is executed before GAPFILL
(AGGREGATE_GAPFILL and AGGREGATE_GAPFILL_AGGREGATE).
This PR fixes this by recomputing the `timeBucketColumnIndex` for the
aforementioned GAPFILL types using the broker response to identify the index of
the time bucket column. It also patches the rest of the GAPFILL computation
logic to remove the assumption that GAPFILL is the first column in the subquery.
### Testing
Successfully tested that the fix works in TIMESTAMP quickstart. Also added
multiple unit tests to make sure each possible flow is working correctly with
the fix.
(Note: current unit tests did not have queries that trigger
CountGapfillProcessor and SumAvgGapfillProcessor and this PR adds tests for
those query shapes as well).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]