Re: Pinot VS Kafka Streams (or Spark SQL)

Seunghyun Lee Tue, 26 Mar 2019 17:06:52 -0700

Hi Sai,

Thank you for the mail. Your understanding is correct.

Systems that you mentioned (Pinot, Kafka, Spark SQL) are built for
different purposes and they are all critical components when you need to
build the data analytics data pipeline.

Kafka is a pub-sub message delivering system that is usually used to build
the streaming data pipeline. Spark SQL is built to add SQL interface layer
on top of Spark. You can think it as one of offline OLAP query engines like
Hive or Presto that are used for computing complex ad-hoc queries over
large data.

Pinot aims to support "interactive" analytics use cases (e.g. dashboard,
site facing reporting - who's viewed my profile on LInkedIn) where latency
requirements are more strict than offline reporting use cases.

You mentioned that "I can write a consumer that processes events from a
stream and computes the needed analytic/metric and store it into a data
store to serve". This is basically what Pinot does (consuming data from a
stream, index it for serving, allow query interface)

Below links are some references that we have published to public. Please
let us know if you have any other question.

Best,
Seunghyun

cc.

LinkedIn Blog Posts
https://engineering.linkedin.com/analytics/real-time-analytics-massive-scale-pinot
https://engineering.linkedin.com/blog/2019/03/pinot-joins-apache-incubator

Slides
https://www.slideshare.net/jeanfrancoisim/intro-to-pinot-20160104
https://www.slideshare.net/seunghyunlee1460/pinot-realtime-olap-for-530-million-users-sigmod-2018-107394584

On Tue, Mar 26, 2019 at 3:46 PM Sai Boorlagadda <[email protected]>
wrote:

> Hello Devs,
>
> Is there any content or blog post to understand the difference between
> Pinot vs Kafka Streams (or Spark SQL)? At a high level after reading Pinot
> is actually bringing off-the-shelf components which otherwise data
> engineers have to write on top of KAFKA streams?
>
> I mean I can write a consumer that processes events from a stream and
> computes the needed analytic/metric and store it into a data store to
> serve. Though my application layer has to route the requests either to the
> streaming analytics datastore or batch analytics data store.
>
> So my understanding is Pinot abstracts these two things from the
> application. Is that correct?
> Sai
>

Re: Pinot VS Kafka Streams (or Spark SQL)

Reply via email to