Thanks, Seunghyun for confirming my understanding.

On Tue, Mar 26, 2019 at 5:06 PM Seunghyun Lee <[email protected]> wrote:

> Hi Sai,
>
> Thank you for the mail. Your understanding is correct.
>
> Systems that you mentioned (Pinot, Kafka, Spark SQL) are built for
> different purposes and they are all critical components when you need to
> build the data analytics data pipeline.
>
> Kafka is a pub-sub message delivering system that is usually used to build
> the streaming data pipeline. Spark SQL is built to add SQL interface layer
> on top of Spark. You can think it as one of offline OLAP query engines like
> Hive or Presto that are used for computing complex ad-hoc queries over
> large data.
>
> Pinot aims to support "interactive" analytics use cases (e.g. dashboard,
> site facing reporting - who's viewed my profile on LInkedIn) where latency
> requirements are more strict than offline reporting use cases.
>
> You mentioned that "I can write a consumer that processes events from a
> stream and computes the needed analytic/metric and store it into a data
> store to serve". This is basically what Pinot does (consuming data from a
> stream, index it for serving, allow query interface)
>
> Below links are some references that we have published to public. Please
> let us know if you have any other question.
>
> Best,
> Seunghyun
>
> cc.
>
> LinkedIn Blog Posts
>
> https://engineering.linkedin.com/analytics/real-time-analytics-massive-scale-pinot
> https://engineering.linkedin.com/blog/2019/03/pinot-joins-apache-incubator
>
> Slides
> https://www.slideshare.net/jeanfrancoisim/intro-to-pinot-20160104
>
> https://www.slideshare.net/seunghyunlee1460/pinot-realtime-olap-for-530-million-users-sigmod-2018-107394584
>
> On Tue, Mar 26, 2019 at 3:46 PM Sai Boorlagadda <[email protected]
> >
> wrote:
>
> > Hello Devs,
> >
> > Is there any content or blog post to understand the difference between
> > Pinot vs Kafka Streams (or Spark SQL)? At a high level after reading
> Pinot
> > is actually bringing off-the-shelf components which otherwise data
> > engineers have to write on top of KAFKA streams?
> >
> > I mean I can write a consumer that processes events from a stream and
> > computes the needed analytic/metric and store it into a data store to
> > serve. Though my application layer has to route the requests either to
> the
> > streaming analytics datastore or batch analytics data store.
> >
> > So my understanding is Pinot abstracts these two things from the
> > application. Is that correct?
> > Sai
> >
>

Reply via email to