This is an automated email from the ASF dual-hosted git repository.
jlli pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-pinot.wiki.git
The following commit(s) were added to refs/heads/master by this push:
new 59c5b78 Updated How To Use Pinot (markdown)
59c5b78 is described below
commit 59c5b78c018f244dcf397d5089c462c125ffbab7
Author: Jialiang Li <[email protected]>
AuthorDate: Tue Feb 5 14:42:06 2019 -0800
Updated How To Use Pinot (markdown)
---
How-To-Use-Pinot.md | 22 +++++++++++-----------
1 file changed, 11 insertions(+), 11 deletions(-)
diff --git a/How-To-Use-Pinot.md b/How-To-Use-Pinot.md
index 559298b..0d2b07d 100644
--- a/How-To-Use-Pinot.md
+++ b/How-To-Use-Pinot.md
@@ -19,7 +19,7 @@ from the [Bureau of Transportation
Statistics](http://www.rita.dot.gov/bts/).
## Data modeling
-First off, we'll define our table schema. Table schemas in Pinot are defined
+First off, let's define our table schema. Table schemas in Pinot are defined
using JSON; for each column, its name, data type and column type (metric,
dimension or time column) is specified.
@@ -128,7 +128,7 @@ cluster. In a Pinot cluster, there are three types of nodes:
- Controller: Monitors the cluster to ensure that there are enough nodes
serving
a given table and coordinates operations between nodes
- Broker: Receives queries, manages scatter-gather of query results and returns
- query results to clients
+ query results back to clients
- Server: Holds the data that is queried and answers queries from the brokers
In this tutorial, we will only start a single instance of each type of node. In
@@ -165,10 +165,10 @@ parallelism.
## Workflow set up
-Pinot can ingest data that comes from a real time data source like Kafka or a
+Pinot can ingest data that come from a real time data source like Kafka or a
batch processing system like Hadoop.
-
+### Pinot hybrid workflow[[Pinot - Hybrid flow.png]]
First, we'll set up a workflow that uses Hadoop. This allows us to
pre-aggregate
the data, join it with other data sources, tidy it up and so on; in this case,
@@ -207,8 +207,8 @@ Once our schema is uploaded, we need to create a table
definition. The table def
"segmentAssignmentStrategy" : "BalanceNumSegmentAssignmentStrategy"
},
"tenants" : {
- "broker":"brokerOne",
- "server":"serverOne"
+ "broker":"DefaultTenant_BROKER",
+ "server":"DefaultTenant_SERVER"
},
"tableIndexConfig" : {
"invertedIndexColumns" : ["Carrier"],
@@ -242,7 +242,7 @@ PostQueryCommand - Result:
{"traceInfo":{},"numDocsScanned":0,"aggregationResult
The offline workflow for Pinot is comprised of three parts:
-- Transform the source data into the desired Avro-formatted data to be
ingested by Pinot
+- Transforming the source data into the desired Avro-formatted data to be
ingested by Pinot
- Creating index segments out of the Avro-formatted data
- Pushing the index segments to the Pinot cluster
@@ -306,7 +306,7 @@ Oozie.
### Realtime flow
Sometime, depending on the use for Pinot, fresher data is
-required than what would be acheievable through pushing data
+required than what would be achievable through pushing data
from Hadoop. It is possible to configure Pinot to consume data
from Kafka, reducing latency between events to the delay
between the event being produced and it being consumed by
@@ -314,7 +314,7 @@ Pinot, typically less than a second.
Ingesting data in real time through Kafka does not necessarily
replace the Hadoop data pipeline, but rather enhances it. If a
-table has both an offline and realtime variant with the same
+table has both offline and realtime variant with the same
name, the query processing for this table switches to hybrid
mode. In hybrid mode, Pinot keeps a high watermark of the time
column for the offline data, and when it receives a query, it
@@ -358,8 +358,8 @@ from, using a table definition such as this one:
},
"tableType":"REALTIME",
"tenants" : {
- "broker":"brokerOne",
- "server":"serverOne"
+ "broker":"DefaultTenant_BROKER",
+ "server":"DefaultTenant_SERVER"
},
"metadata": {
}
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]