Hi Sai, Thank you for the mail. Your understanding is correct.
Systems that you mentioned (Pinot, Kafka, Spark SQL) are built for different purposes and they are all critical components when you need to build the data analytics data pipeline. Kafka is a pub-sub message delivering system that is usually used to build the streaming data pipeline. Spark SQL is built to add SQL interface layer on top of Spark. You can think it as one of offline OLAP query engines like Hive or Presto that are used for computing complex ad-hoc queries over large data. Pinot aims to support "interactive" analytics use cases (e.g. dashboard, site facing reporting - who's viewed my profile on LInkedIn) where latency requirements are more strict than offline reporting use cases. You mentioned that "I can write a consumer that processes events from a stream and computes the needed analytic/metric and store it into a data store to serve". This is basically what Pinot does (consuming data from a stream, index it for serving, allow query interface) Below links are some references that we have published to public. Please let us know if you have any other question. Best, Seunghyun cc. LinkedIn Blog Posts https://engineering.linkedin.com/analytics/real-time-analytics-massive-scale-pinot https://engineering.linkedin.com/blog/2019/03/pinot-joins-apache-incubator Slides https://www.slideshare.net/jeanfrancoisim/intro-to-pinot-20160104 https://www.slideshare.net/seunghyunlee1460/pinot-realtime-olap-for-530-million-users-sigmod-2018-107394584 On Tue, Mar 26, 2019 at 3:46 PM Sai Boorlagadda <[email protected]> wrote: > Hello Devs, > > Is there any content or blog post to understand the difference between > Pinot vs Kafka Streams (or Spark SQL)? At a high level after reading Pinot > is actually bringing off-the-shelf components which otherwise data > engineers have to write on top of KAFKA streams? > > I mean I can write a consumer that processes events from a stream and > computes the needed analytic/metric and store it into a data store to > serve. Though my application layer has to route the requests either to the > streaming analytics datastore or batch analytics data store. > > So my understanding is Pinot abstracts these two things from the > application. Is that correct? > Sai >
