This is an automated email from the ASF dual-hosted git repository.
jerrypeng pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-pulsar.wiki.git
The following commit(s) were added to refs/heads/master by this push:
new a190d49 Updated PIP 18: Pulsar SQL (markdown)
a190d49 is described below
commit a190d49d6128a93533e5acbecd4fb4a1da83f752
Author: Boyang Jerry Peng <[email protected]>
AuthorDate: Thu Jul 19 12:39:36 2018 -0700
Updated PIP 18: Pulsar SQL (markdown)
---
PIP-18:-Pulsar-SQL.md | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)
diff --git a/PIP-18:-Pulsar-SQL.md b/PIP-18:-Pulsar-SQL.md
index 50333c7..716f64b 100644
--- a/PIP-18:-Pulsar-SQL.md
+++ b/PIP-18:-Pulsar-SQL.md
@@ -9,10 +9,13 @@ N/A
We are trying to create a method in which users can explore, in a natural
manner, the data already stored within Pulsar topics. We believe the best way
to accomplish this is to expose SQL interface that allows users to query
existing data within a Pulsar cluster.
Just to be absolutely clear, the SQL we are proposing is for querying data
already in Pulsar and we are currently not proposing the implementation of any
sort of SQL on data streams
-Why are we doing this?
-Many users are interested for such a feature. For example, many users store
large amounts of historical data in Pulsar for various purposes. Giving them
to capability to query that that data gives them huge value. Users will
typically need to stream the data out of Pulsar and into another platform to do
any sort of analysis, but with Pulsar SQL, users can just use one platform.
-How are we going to do it?
+
+## Why are we doing this?
+
+Many users are interested in such a feature. For example, many users store
large amounts of historical data in Pulsar for various purposes. Giving them
to capability to query that that data gives them huge value. Users will
typically need to stream the data out of Pulsar and into another platform to do
any sort of analysis, but with Pulsar SQL, users can just use one platform.
+
+## How are we going to do it?
With the implementation of a schema registry in Pulsar, data can be structured
so that it can be easily mapped to tables that can be queried by SQL. We plan
on using Presto (https://prestodb.io/) as the backbone of Pulsar SQL. A
connector can be implemented using the Presto connector SPI that allows presto
to ingest data from Pulsar and to be queried using Presto’s existing SQL
framework.
@@ -21,7 +24,6 @@ The schema registry will be used to generate the structure of
tables that will b
Thus, Pulsar will be queried for metadata concerning topics and schemas and
from that metadata, we will go directly to the bookies to load and deserialize
the data.
-
## Goals
* Allow users to submit SQL queries using a Pulsar CLI
@@ -92,4 +94,4 @@ Let’s break the implementation into multiple phases:
6. Performance testing and optimizing
-More to come...
\ No newline at end of file
+More to come...