This is an automated email from the ASF dual-hosted git repository.
jlli pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-pinot.wiki.git
The following commit(s) were added to refs/heads/master by this push:
new 18844c9 Updated Architecture (markdown)
18844c9 is described below
commit 18844c9fa58d27644081710654c5aa1a76006928
Author: Jialiang Li <[email protected]>
AuthorDate: Mon Feb 4 11:47:06 2019 -0800
Updated Architecture (markdown)
---
Architecture.md | 40 ++++++++++++++++++++--------------------
1 file changed, 20 insertions(+), 20 deletions(-)
diff --git a/Architecture.md b/Architecture.md
index 852c980..537dd64 100644
--- a/Architecture.md
+++ b/Architecture.md
@@ -2,7 +2,7 @@
[[image2014-11-12 19-54-12.png]]
-Goal of Pinot is to provide analytics on any given data set. The input data
set may exists either in Hadoop or Kafka. At LinkedIn, most tracking data is
published into Kafka and it eventually moves to Hadoop via ETL process. In
order to provide fast analytics, Pinot organizes the data into columnar format
and make use of various indexing technologies such as bitmap, inverted index
etc. Data on Hadoop is converted into Index Segment via Map reduce jobs. Index
segments are then pushed to P [...]
+Goal of Pinot is to provide analytics on any given data set. The input data
set may exist either in Hadoop or Kafka. At LinkedIn, most tracking data is
published into Kafka and it eventually moves to Hadoop via ETL process. In
order to provide fast analytics, Pinot organizes the data into columnar format
and makes use of various indexing technologies such as bitmap, inverted index
etc. Data on Hadoop is converted into Index Segment via Map reduce jobs. Index
segments are then pushed to P [...]
### Data Flow
@@ -20,7 +20,7 @@ Real-time flow is slightly different from the Hadoop flow.
Real-time nodes direc
#### Query routing
-From a user point of view, all queries are sent at Pinot Broker. The user does
not have to worry about the real time and historical nodes. Pinot Broker is
smart enough to query realtime and historical nodes separately and merge the
results before sending back the response. For example, let's say user queries
**select count(*) from table where time > T** time stamp, Pinot understands
that time is a special dimension and breaks the query appropriately between
realtime and historical. If th [...]
+From a user point of view, all queries are sent at Pinot Broker. The user does
not have to worry about the real time and historical nodes. Pinot Broker is
smart enough to query realtime and historical nodes separately and merge the
results before sending back the response. For example, let's say user queries
**select count(*) from table where time > T** time stamp, Pinot understands
that time is a special dimension and breaks the query appropriately between
realtime and historical. If th [...]
### Pinot Components Architecture
@@ -40,13 +40,13 @@ Pinot team provides the library for generating the
segments. Pinot expects the i
##### Segment move from HDFS to NFS
-After Pinot segments are generated on Hadoop, they need to be transferred over
to online serving cluster. This is done via Hadoop server push job provided by
Pinot. This job is non map reduce java job that runs on the Hadoop gateway
machine via azkaban. It reads the data file from HDFS and performs a send file
HTTP POST on one of the endpoints exposed by the Pinot Controllers. The files
are then saved to NFS directories mounted on the controller nodes.
+After Pinot segments are generated on Hadoop, they need to be transferred over
to online serving cluster. This is done via Hadoop server push job provided by
Pinot. This job is non map reduce java job that runs on the Hadoop gateway
machine via Azkaban. It reads the data file from HDFS and performs a send file
HTTP POST on one of the endpoints exposed by the Pinot Controllers. The files
are then saved to NFS directories mounted on the controller nodes.
-After saving the segment to NFS, controller assigns the segment to one of the
pinot servers. The assignment of a segment to Pinot server is maintained and
managed by Helix.
+After saving the segment to NFS, controller assigns the segment to one of the
pinot servers. The assignment of a segment to Pinot server is maintained and
managed by [Helix](https://helix.apache.org/).
##### Segment move from NFS to Historical Node
-The segment to Pinot server assignment is stored in Helix Idealstate. Helix
monitors the liveness of a server, when the server starts up, Helix will notify
the pinot server about this segment. The metadata about this segment contains
the URI to fetch the segment and is stored in Helix Property Store backed by
Zookeeper. Pinot server downloads the segment file from the controller and
extracts its content on the local disk.
+The segment to Pinot server assignment is stored in Helix Idealstate. Helix
monitors the liveness of a server. When the server starts up, Helix will notify
the pinot server about this segment. The metadata about this segment contains
the URI to fetch the segment and is stored in Helix Property Store backed by
Zookeeper. Pinot server downloads the segment file from the controller and
extracts its content on the local disk.
##### Segment Loading
@@ -78,49 +78,49 @@ All Pinot Servers and Brokers are managed by Apache Helix.
Apache Helix is a gen
Helix divides nodes into 3 logical components based on their responsibilities:
-1. **Participant**: The nodes that actually host the distributed resources
-2. **Spectator**: The nodes that simply observe the current state of each
Participant and routes requests accordingly. Routers, for example, need to know
the instance on which a partition is hosted and its state in order to route the
request to the appropriate endpoint
-3. **Controller**: The node that observes and controls the Participant nodes.
It is responsible for coordinating all transitions in the cluster and ensuring
that state constraints are satisfied while maintaining cluster stability
+1. **Participant**: The nodes that actually host the distributed resources.
+2. **Spectator**: The nodes that simply observe the current state of each
Participant and routes requests accordingly. Routers, for example, need to know
the instance on which a partition is hosted and its state in order to route the
request to the appropriate endpoint.
+3. **Controller**: The node that observes and controls the Participant nodes.
It is responsible for coordinating all transitions in the cluster and ensuring
that state constraints are satisfied while maintaining cluster stability.
Pinot Terminology and its mapping to Helix Concepts. See [Pinot Core Concepts
and Terminology](Pinot-Core-Concepts-and-Terminology)
* **Pinot Segment:** This is modeled as **_Helix Partition_**. Each Pinot
Segment can have multiple copies referred to as Replicas.
* **Pinot Table**: Multiple Pinot segment are grouped into a logical entity
referred to as Pinot Table. All segments belonging to a Pinot Table have the
same schema.
* **Pinot Server**: This is modeled as a _**Helix Participant**_. Pinot
Server host the segments (Helix Partition) belonging to one or more Pinot Table
(Helix Resource).
-* **Pinot Broker**: This is modeled as a **Helix Spectator** that observes
the cluster for changes in the state of segments and Pinot Server. In order
support Multi tenancy in Pinot Brokers, Pinot Brokers are also modeled as Helix
Participants.
+* **Pinot Broker**: This is modeled as a **Helix Spectator** that observes
the cluster for changes in the state of segments and Pinot Server. In order to
support Multi tenancy in Pinot Brokers, Pinot Brokers are also modeled as Helix
Participants.
#####
[[image2015-5-17 13-32-28.png]]
-<span style="line-height: 1.4285715;">**Zookeeper**: Zookeeper is used to
store the state of the cluster. Its also used to store configuration needed for
Helix and Pinot. Only dynamic configuration that is specific to a use case such
as table schema, number of segments and other metadata is stored in Zookeeper.
Zookeeper is also used by Helix Controller to communicate with the PARTICIPANT
and SPECTATORS. Zookeeper is strongly consistent and fault tolerant. We
typically run 3 or5 Zookeepe [...]
+<span style="line-height: 1.4285715;">**Zookeeper**: Zookeeper is used to
store the state of the cluster. Its also used to store configuration needed for
Helix and Pinot. Only dynamic configuration that is specific to a use case such
as table schema, number of segments and other metadata is stored in Zookeeper.
Zookeeper is also used by Helix Controller to communicate with the PARTICIPANT
and SPECTATORS. Zookeeper is strongly consistent and fault tolerant. We
typically run 3 or 5 Zookeep [...]
-<span style="line-height: 1.4285715;"></span>**Pinot Controller:** All admin
commands such as creating allocating Pinot Server and Brokers for each use
case, Creating New Table or Uploading New Segments go through Pinot Controller.
Pinot Controller wraps Helix Controller within the same process. All Pinot
Admin Commands internally get translated to Helix Commands via Helix Admin
Apis.</span>
+<span style="line-height: 1.4285715;"></span>**Pinot Controller:** All admin
commands such as allocating Pinot Server and Brokers for each use case,
creating new table or uploading new segments go through Pinot Controller. Pinot
Controller wraps Helix Controller within the same process. All Pinot Admin
Commands internally get translated to Helix Commands via Helix Admin
Apis.</span>
-* <span style="line-height: 1.4285715;">Allocate Pinot Server/Broker: This
commands is run when we on board new use case or to allocate additional
capacity to an existing use case. This parameter simply takes in the #1 use
case name X #2\. number of Pinot Servers S and number of Brokers B needed for
the use case. Pinot Controllers uses Helix Tagging Api to tag S Server
Instances and B Brokers Instances in the cluster as X. This means all
subsequent tables that belong to use case X will [...]
+* <span style="line-height: 1.4285715;">Allocate Pinot Server/Broker: This
commands is run when we on board new use case or allocate additional capacity
to an existing use case. This parameter simply takes in the #1 use case name X
#2\. number of Pinot Servers S and number of Brokers B needed for the use case.
Pinot Controllers use Helix Tagging Api to tag S Server Instances and B Brokers
Instances in the cluster as X. This means all subsequent tables that belong to
use case X will be [...]
* <span style="line-height: 1.4285715;">Create Table: This will create an
Empty IdealState for a Table. This table must also be tagged as X which means
all segments of this table will be allocated to Instances that have the same
tag X. Additional metadata such as table retention, allocation strategy etc is
stored in Zookeeper using Helix Property Store Api.</span>
-* <span style="line-height: 1.4285715;">Upload Segment: Pinot Controller
adds an segment entry to the table IdealState. The number of entries added is
according the number of replicas configured for the Table T. While its possible
to let Helix decide the assignment of Segment to Pinot Server Instance by using
AUTO Idealstate mode, in the current version we use the CUSTOM Idealstate mode.
See [Helix Idealstate
Mode](http://helix.apache.org/0.6.4-docs/tutorial_rebalance.html) for additio
[...]
+* <span style="line-height: 1.4285715;">Upload Segment: Pinot Controller
adds an segment entry to the table IdealState. The number of entries added is
according the number of replicas configured for the Table T. While it's
possible to have Helix decide the assignment of Segment to Pinot Server
Instance by using AUTO Idealstate mode, in the current version we use the
CUSTOM Idealstate mode. See [Helix Idealstate
Mode](http://helix.apache.org/0.6.4-docs/tutorial_rebalance.html) for addit
[...]
-<span style="line-height: 1.4285715;"></span>**Helix Controller:** As
explained in previous section all Pinot Admin commands simply get translated
into Helix Admin Commands. The Helix commands in turn update the metadata
stored in Zookeeper. Helix Controller acts as the brain of the system and
translates all metadata changes into a set of actions and is responsible for
the execution of these actions on the respective participants. This is achieved
via State Transitions. See [Helix Archit [...]
+<span style="line-height: 1.4285715;"></span>**Helix Controller:** As
explained in previous section all Pinot Admin commands simply get translated
into Helix Admin Commands. The Helix commands in turn update the metadata
stored in Zookeeper. Helix Controller acts as the brain of the system and
translates all metadata changes into a set of actions and is responsible for
the execution of these actions on the respective participants. This is achieved
via State Transitions. See [Helix Archit [...]
See [Multi tenancy in Pinot
2.0](https://github.com/linkedin/pinot/wiki/Multitenancy#multi-tenancy-in-pinot-20)
for more info on how Multi tenancy is solved in Pinot 2.0.
#### Broker Node
-The responsibility of Broker is to route a given query to appropriate Pinot
Server instances, collect the responses and merge the responses into final
result and send it back to the client. The two main steps involved in this are
+The responsibility of Broker is to route a given query to appropriate Pinot
Server instances, collect and merge the responses into final result and send it
back to the client. The two main steps involved in this are
-**service discovery**: Service discovery is the mechanism of knowing what
Tables are hosted in the cluster and location of the Table Segments and the
time range for each segment. As explained in the previous section, this is
derived from the information stored in Zookeeper via Helix library. Broker uses
this information not only to compute the subset of the nodes to send the
request to, it prunes the number of segments to be queries. This is achieved by
looking at the time range in the q [...]
+**service discovery**: Service discovery is the mechanism of knowing what
Tables are hosted in the cluster and location of the Table Segments and the
time range for each segment. As explained in the previous section, this is
derived from the information stored in Zookeeper via Helix library. Broker uses
this information not only to compute the subset of the nodes to send the
request to, but also prunes the number of segments to be queries. This is
achieved by looking at the time range in [...]
-**Scatter gather:** Once the broker computes the set of nodes to route the
query, the requests are routed to the respective Pinot Server nodes. Each
server nodes processes the query and returns the response, broker will merge
the responses from individual Server and returns the response to the client.
The merging logic is dependent on the query selection with Limit, aggregation,
group by top K etc. If any of the servers fails to process the query or Time's
out, broker will return partial [...]
+**Scatter gather:** Once the broker computes the set of nodes to route the
query, the requests are routed to the respective Pinot Server nodes. Each
server node processes the query and returns the response. Broker will merge the
responses from individual Server and returns the response back to the client.
The merging logic is dependent on the query selection with limit, aggregation,
group by top K etc. If any of the servers fails to process the query or Time's
out, broker will return par [...]
### Pinot Index Segment
[[image2015-5-17 17-59-10.png]]
-Pinot Index Segment is the columnar representation of the raw data. The raw
data is generally represented in a row oriented format which can be AVRO, JSON,
CSV etc. Converting row oriented format into columnar can reduce storage space
and allow fast scan for specific columns. Row oriented format is efficient when
query is either updating or reading a specific row in the data. This is
typically the case with OLTP use cases where relational databases such as
Oracle, MySQL etc is used. Colu [...]
+Pinot Index Segment is the columnar representation of the raw data. The raw
data is generally represented in a row oriented format which can be AVRO, JSON,
CSV etc. Converting row oriented format into columnar can reduce storage space
and allow fast scan for specific columns. Row oriented format is efficient when
query is either updating or reading a specific row in the data. This is
typically the case with OLTP use cases where relational databases such as
Oracle, MySQL etc is used. Colu [...]
-Columnar format also provides storage related benefits. Some of the columns
contains values that are repetitive. For e.g. if the column is of type country
then storing in row oriented format will require space proportional
varchar(100) * number of rows. Columnar format can apply various encoding such
as Fixed Bit and on top of that, apply compression algorithm to compress the
data further. While these techniques can be applied for row oriented storage as
well, the encoding and compressio [...]
+Columnar format also provides storage related benefits. Some of the columns
contain values that are repetitive. For e.g. if the column is of type country
then storing in row oriented format will require space proportional
varchar(100) * number of rows. Columnar format can apply various encoding such
as Fixed Bit and on top of that, applying compression algorithm to compress the
data further. While these techniques can be applied for row oriented storage as
well, the encoding and compress [...]
-Columnar formats definitely have their down sides, creating efficient columnar
formats typically take time and once created they cannot be mutated easily.
While this might a problem for OLTP workloads, typical OLAP use cases consists
of TimeSeries data which is immutable.
+Columnar formats definitely have their down sides, creating efficient columnar
formats typically take time and once created they cannot be mutated easily.
While this might be a problem for OLTP workloads, typical OLAP use cases
consists of TimeSeries data which is immutable.
#### Anatomy of Index Segment
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]