Re: [I] [Proposal] Generic Time Series Engine in Pinot [pinot]

via GitHub Tue, 24 Sep 2024 03:57:18 -0700


ankitsultana commented on issue #13760:
URL: https://github.com/apache/pinot/issues/13760#issuecomment-2370935188

@gortiz thanks Gonzalo for sharing your views. There are several points made
by you here and in the design document, but I think it's best to first
establish the broader picture in which we are doing this.

As you know, most companies have been using Pinot for OLAP on mainly
"events". The idea being that you can power user-facing dashboards or other
end-user App UI elements like counters based on Realtime data with low latency.

At least at Uber, over the last 2 years, we have seen an increase in the
number of users who are using Pinot for alerting and monitoring ("metrics").
And recently we also shared in the Uber Meetup that we are now using Pinot for
our logging platform too. (side-note: Pinot specifically solves the problem of
High Cardinality Metrics)

The term being used these days to cover this landscape of use-cases is
[MELT](https://www.youtube.com/watch?v=CgrDLykZ21I): Metrics, Events, Logs and
Traces.

Pinot is able to solve all of these problems, but its capability to solve
all of these problems differs. What our proposal aims to do is to immediately
improve Pinot's ability to tackle the Metrics and the Logging problem, and IMO
bring it up to the state of the art in the HICAM space. (note that CH also
announced in their latest release that they [plan to support
PromQL](https://clickhouse.com/blog/clickhouse-release-24-08)).

re: your proposal about Streaming queries in Pinot, I think it's talking
about a different problem.

re: your specific points about this not being necessary and we can leverage
the MSE or add enable some SQL extensions, I shared the transformNull (gapfill)
example in the doc which should help clear that up. For other readers, see the
comment thread [at the bottom of this
doc](https://docs.google.com/document/d/1SBDDf71QZINYUjAbRSWguNMfbrWRGfdcF1JPi8SJZlM/edit)

Finally, I think this time-series engine may stick out like a sore thumb
right now, but once we integrate it with the Multistage Engine Shuffle
Framework, I think it will sit very neatly with the rest of the code. And from
a use-case perspective, it takes Pinot forward and expands its capabilities
around use-cases that it theoretically can already support, but can't support
as well. (side-note: from a high-level, I am thinking in the direction that we
will have a common Operator interface in the SPI that will be oblivious of the
data-model: relational, time-series, etc.)

A potential future direction is that we may make the entire engine (above
the V1 server level engine) pluggable, but it might make sense to consider that
after making sure that the time-series engine becomes a success first.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] [Proposal] Generic Time Series Engine in Pinot [pinot]

Reply via email to