Thanks, Seunghyun for confirming my understanding. On Tue, Mar 26, 2019 at 5:06 PM Seunghyun Lee <[email protected]> wrote:
> Hi Sai, > > Thank you for the mail. Your understanding is correct. > > Systems that you mentioned (Pinot, Kafka, Spark SQL) are built for > different purposes and they are all critical components when you need to > build the data analytics data pipeline. > > Kafka is a pub-sub message delivering system that is usually used to build > the streaming data pipeline. Spark SQL is built to add SQL interface layer > on top of Spark. You can think it as one of offline OLAP query engines like > Hive or Presto that are used for computing complex ad-hoc queries over > large data. > > Pinot aims to support "interactive" analytics use cases (e.g. dashboard, > site facing reporting - who's viewed my profile on LInkedIn) where latency > requirements are more strict than offline reporting use cases. > > You mentioned that "I can write a consumer that processes events from a > stream and computes the needed analytic/metric and store it into a data > store to serve". This is basically what Pinot does (consuming data from a > stream, index it for serving, allow query interface) > > Below links are some references that we have published to public. Please > let us know if you have any other question. > > Best, > Seunghyun > > cc. > > LinkedIn Blog Posts > > https://engineering.linkedin.com/analytics/real-time-analytics-massive-scale-pinot > https://engineering.linkedin.com/blog/2019/03/pinot-joins-apache-incubator > > Slides > https://www.slideshare.net/jeanfrancoisim/intro-to-pinot-20160104 > > https://www.slideshare.net/seunghyunlee1460/pinot-realtime-olap-for-530-million-users-sigmod-2018-107394584 > > On Tue, Mar 26, 2019 at 3:46 PM Sai Boorlagadda <[email protected] > > > wrote: > > > Hello Devs, > > > > Is there any content or blog post to understand the difference between > > Pinot vs Kafka Streams (or Spark SQL)? At a high level after reading > Pinot > > is actually bringing off-the-shelf components which otherwise data > > engineers have to write on top of KAFKA streams? > > > > I mean I can write a consumer that processes events from a stream and > > computes the needed analytic/metric and store it into a data store to > > serve. Though my application layer has to route the requests either to > the > > streaming analytics datastore or batch analytics data store. > > > > So my understanding is Pinot abstracts these two things from the > > application. Is that correct? > > Sai > > >
