[GitHub] [arrow-datafusion] wolffcm commented on issue #4809: Support Gap Filling on Time Series Data

GitBox Tue, 10 Jan 2023 18:00:42 -0800


wolffcm commented on issue #4809:
URL: 
https://github.com/apache/arrow-datafusion/issues/4809#issuecomment-1378146101

@jiacai2050
> Subquery seems unnecessary, if time range in time_bucket_gapfill different
with range in where clause, maybe we can overwrite where clause, and filter
data in GapFill plan node, something like this(adopted from google docs above):

I understand what you're suggesting, but I worry that rewriting a filter
like that would have unforeseen effects that are difficult to understand. For
example, if the input to `Aggregate` was not a simple scan or filter, but
instead the output of a derived table or a join, it could be hard to do a
rewrite. What would the behavior be for that case?

I think this problem is a really tricky one. In the TImeScale
[docs](https://docs.timescale.com/api/latest/hyperfunctions/gapfilling-interpolation/locf/)
for `locf()` there is a `prev` parameter which solves this problem. It is
basically a subquery. It's a little awkward to have to type it but has the
advantage of not requiring rewriting other parts of the plan.
```sql
locf(
avg(temperature),
(SELECT temperature FROM metrics m2 WHERE m2.time < now() - INTERVAL '2
week' AND m.device_id = m2.device_id ORDER BY time DESC LIMIT 1)
)
```
I'm curious about what you think of that approach.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] wolffcm commented on issue #4809: Support Gap Filling on Time Series Data

Reply via email to