[incubator-pinot.wiki] branch master updated: Updated Architecture (markdown)

jlli Mon, 04 Feb 2019 11:47:26 -0800

This is an automated email from the ASF dual-hosted git repository.

jlli pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-pinot.wiki.git



The following commit(s) were added to refs/heads/master by this push:
     new 18844c9  Updated Architecture (markdown)
18844c9 is described below

commit 18844c9fa58d27644081710654c5aa1a76006928
Author: Jialiang Li <[email protected]>
AuthorDate: Mon Feb 4 11:47:06 2019 -0800

    Updated Architecture (markdown)
---
 Architecture.md | 40 ++++++++++++++++++++--------------------
 1 file changed, 20 insertions(+), 20 deletions(-)

diff --git a/Architecture.md b/Architecture.md
index 852c980..537dd64 100644
--- a/Architecture.md
+++ b/Architecture.md
@@ -2,7 +2,7 @@
 
 [[image2014-11-12 19-54-12.png]]
 
-Goal of Pinot is to provide analytics on any given data set. The input data 
set may exists either in Hadoop or Kafka. At LinkedIn, most tracking data is 
published into Kafka and it eventually moves to Hadoop via ETL process. In 
order to provide fast analytics, Pinot organizes the data into columnar format 
and make use of various indexing technologies such as bitmap, inverted index 
etc. Data on Hadoop is converted into Index Segment via Map reduce jobs. Index 
segments are then pushed to P [...]
+Goal of Pinot is to provide analytics on any given data set. The input data 
set may exist either in Hadoop or Kafka. At LinkedIn, most tracking data is 
published into Kafka and it eventually moves to Hadoop via ETL process. In 
order to provide fast analytics, Pinot organizes the data into columnar format 
and makes use of various indexing technologies such as bitmap, inverted index 
etc. Data on Hadoop is converted into Index Segment via Map reduce jobs. Index 
segments are then pushed to P [...]
 
 ### Data Flow
 
@@ -20,7 +20,7 @@ Real-time flow is slightly different from the Hadoop flow. 
Real-time nodes direc
 
 #### Query routing
 
-From a user point of view, all queries are sent at Pinot Broker. The user does 
not have to worry about the real time and historical nodes. Pinot Broker is 
smart enough to query realtime and historical nodes separately and merge the 
results before sending back the response. For example, let's say user queries 
**select count(*) from table where time > T** time stamp, Pinot understands 
that time is a special dimension and breaks the query appropriately between 
realtime and historical. If th [...]
+From a user point of view, all queries are sent at Pinot Broker. The user does 
not have to worry about the real time and historical nodes. Pinot Broker is 
smart enough to query realtime and historical nodes separately and merge the 
results before sending back the response. For example, let's say user queries 
**select count(*) from table where time > T** time stamp, Pinot understands 
that time is a special dimension and breaks the query appropriately between 
realtime and historical. If th [...]
 
 ### Pinot Components Architecture
 
@@ -40,13 +40,13 @@ Pinot team provides the library for generating the 
segments. Pinot expects the i
 
 ##### Segment move from HDFS to NFS
 
-After Pinot segments are generated on Hadoop, they need to be transferred over 
to online serving cluster. This is done via Hadoop server push job provided by 
Pinot. This job is non map reduce java job that runs on the Hadoop gateway 
machine via azkaban. It reads the data file from HDFS and performs a send file 
HTTP POST on one of the endpoints exposed by the Pinot Controllers. The files 
are then saved to NFS directories mounted on the controller nodes.
+After Pinot segments are generated on Hadoop, they need to be transferred over 
to online serving cluster. This is done via Hadoop server push job provided by 
Pinot. This job is non map reduce java job that runs on the Hadoop gateway 
machine via Azkaban. It reads the data file from HDFS and performs a send file 
HTTP POST on one of the endpoints exposed by the Pinot Controllers. The files 
are then saved to NFS directories mounted on the controller nodes.
 
-After saving the segment to NFS, controller assigns the segment to one of the 
pinot servers. The assignment of a segment to Pinot server is maintained and 
managed by Helix.
+After saving the segment to NFS, controller assigns the segment to one of the 
pinot servers. The assignment of a segment to Pinot server is maintained and 
managed by [Helix](https://helix.apache.org/).
 
 ##### Segment move from NFS to Historical Node
 
-The segment to Pinot server assignment is stored in Helix Idealstate. Helix 
monitors the liveness of a server, when the server starts up, Helix will notify 
the pinot server about this segment. The metadata about this segment contains 
the URI to fetch the segment and is stored in Helix Property Store backed by 
Zookeeper. Pinot server downloads the segment file from the controller and 
extracts its content on the local disk.
+The segment to Pinot server assignment is stored in Helix Idealstate. Helix 
monitors the liveness of a server. When the server starts up, Helix will notify 
the pinot server about this segment. The metadata about this segment contains 
the URI to fetch the segment and is stored in Helix Property Store backed by 
Zookeeper. Pinot server downloads the segment file from the controller and 
extracts its content on the local disk.
 
 ##### Segment Loading
 
@@ -78,49 +78,49 @@ All Pinot Servers and Brokers are managed by Apache Helix. 
Apache Helix is a gen
 
 Helix divides nodes into 3 logical components based on their responsibilities:
 
-1.  **Participant**: The nodes that actually host the distributed resources
-2.  **Spectator**: The nodes that simply observe the current state of each 
Participant and routes requests accordingly. Routers, for example, need to know 
the instance on which a partition is hosted and its state in order to route the 
request to the appropriate endpoint
-3.  **Controller**: The node that observes and controls the Participant nodes. 
It is responsible for coordinating all transitions in the cluster and ensuring 
that state constraints are satisfied while maintaining cluster stability
+1.  **Participant**: The nodes that actually host the distributed resources.
+2.  **Spectator**: The nodes that simply observe the current state of each 
Participant and routes requests accordingly. Routers, for example, need to know 
the instance on which a partition is hosted and its state in order to route the 
request to the appropriate endpoint.
+3.  **Controller**: The node that observes and controls the Participant nodes. 
It is responsible for coordinating all transitions in the cluster and ensuring 
that state constraints are satisfied while maintaining cluster stability.
 
 Pinot Terminology and its mapping to Helix Concepts. See [Pinot Core Concepts 
and Terminology](Pinot-Core-Concepts-and-Terminology)
 
 *   **Pinot Segment:** This is modeled as **_Helix Partition_**. Each Pinot 
Segment can have multiple copies referred to as Replicas.
 *   **Pinot Table**: Multiple Pinot segment are grouped into a logical entity 
referred to as Pinot Table. All segments belonging to a Pinot Table have the 
same schema.
 *   **Pinot Server**: This is modeled as a _**Helix Participant**_. Pinot 
Server host the segments (Helix Partition) belonging to one or more Pinot Table 
(Helix Resource).
-*   **Pinot Broker**: This is modeled as a **Helix Spectator** that observes 
the cluster for changes in the state of segments and Pinot Server. In order 
support Multi tenancy in Pinot Brokers, Pinot Brokers are also modeled as Helix 
Participants.
+*   **Pinot Broker**: This is modeled as a **Helix Spectator** that observes 
the cluster for changes in the state of segments and Pinot Server. In order to 
support Multi tenancy in Pinot Brokers, Pinot Brokers are also modeled as Helix 
Participants.
 
 ##### 
 [[image2015-5-17 13-32-28.png]]
 
-<span style="line-height: 1.4285715;">**Zookeeper**: Zookeeper is used to 
store the state of the cluster. Its also used to store configuration needed for 
Helix and Pinot. Only dynamic configuration that is specific to a use case such 
as table schema, number of segments and other metadata is stored in Zookeeper. 
Zookeeper is also used by Helix Controller to communicate with the PARTICIPANT 
and SPECTATORS. Zookeeper is strongly consistent and fault tolerant. We 
typically run 3 or5 Zookeepe [...]
+<span style="line-height: 1.4285715;">**Zookeeper**: Zookeeper is used to 
store the state of the cluster. Its also used to store configuration needed for 
Helix and Pinot. Only dynamic configuration that is specific to a use case such 
as table schema, number of segments and other metadata is stored in Zookeeper. 
Zookeeper is also used by Helix Controller to communicate with the PARTICIPANT 
and SPECTATORS. Zookeeper is strongly consistent and fault tolerant. We 
typically run 3 or 5 Zookeep [...]
 
-<span style="line-height: 1.4285715;"></span>**Pinot Controller:** All admin 
commands such as creating allocating Pinot Server and Brokers for each use 
case, Creating New Table or Uploading New Segments go through Pinot Controller. 
Pinot Controller wraps Helix Controller within the same process. All Pinot 
Admin Commands internally get translated to Helix Commands via Helix Admin 
Apis.</span>
+<span style="line-height: 1.4285715;"></span>**Pinot Controller:** All admin 
commands such as allocating Pinot Server and Brokers for each use case, 
creating new table or uploading new segments go through Pinot Controller. Pinot 
Controller wraps Helix Controller within the same process. All Pinot Admin 
Commands internally get translated to Helix Commands via Helix Admin 
Apis.</span>
 
-*   <span style="line-height: 1.4285715;">Allocate Pinot Server/Broker: This 
commands is run when we on board new use case or to allocate additional 
capacity to an existing use case. This parameter simply takes in the #1 use 
case name X #2\. number of Pinot Servers S and number of Brokers B needed for 
the use case. Pinot Controllers uses Helix Tagging Api to tag S Server 
Instances and B Brokers Instances in the cluster as X. This means all 
subsequent tables that belong to use case X will [...]
+*   <span style="line-height: 1.4285715;">Allocate Pinot Server/Broker: This 
commands is run when we on board new use case or allocate additional capacity 
to an existing use case. This parameter simply takes in the #1 use case name X 
#2\. number of Pinot Servers S and number of Brokers B needed for the use case. 
Pinot Controllers use Helix Tagging Api to tag S Server Instances and B Brokers 
Instances in the cluster as X. This means all subsequent tables that belong to 
use case X will be  [...]
 *   <span style="line-height: 1.4285715;">Create Table: This will create an 
Empty IdealState for a Table. This table must also be tagged as X which means 
all segments of this table will be allocated to Instances that have the same 
tag X. Additional metadata such as table retention, allocation strategy etc is 
stored in Zookeeper using Helix Property Store Api.</span>
-*   <span style="line-height: 1.4285715;">Upload Segment: Pinot Controller 
adds an segment entry to the table IdealState. The number of entries added is 
according the number of replicas configured for the Table T. While its possible 
to let Helix decide the assignment of Segment to Pinot Server Instance by using 
AUTO Idealstate mode, in the current version we use the CUSTOM Idealstate mode. 
See [Helix Idealstate 
Mode](http://helix.apache.org/0.6.4-docs/tutorial_rebalance.html) for additio 
[...]
+*   <span style="line-height: 1.4285715;">Upload Segment: Pinot Controller 
adds an segment entry to the table IdealState. The number of entries added is 
according the number of replicas configured for the Table T. While it's 
possible to have Helix decide the assignment of Segment to Pinot Server 
Instance by using AUTO Idealstate mode, in the current version we use the 
CUSTOM Idealstate mode. See [Helix Idealstate 
Mode](http://helix.apache.org/0.6.4-docs/tutorial_rebalance.html) for addit 
[...]
 
-<span style="line-height: 1.4285715;"></span>**Helix Controller:** As 
explained in previous section all Pinot Admin commands simply get translated 
into Helix Admin Commands. The Helix commands in turn update the metadata 
stored in Zookeeper. Helix Controller acts as the brain of the system and 
translates all metadata changes into a set of actions and is responsible for 
the execution of these actions on the respective participants. This is achieved 
via State Transitions. See [Helix Archit [...]
+<span style="line-height: 1.4285715;"></span>**Helix Controller:** As 
explained in previous section all Pinot Admin commands simply get translated 
into Helix Admin Commands. The Helix commands in turn update the metadata 
stored in Zookeeper. Helix Controller acts as the brain of the system and 
translates all metadata changes into a set of actions and is responsible for 
the execution of these actions on the respective participants. This is achieved 
via State Transitions. See [Helix Archit [...]
 
 See [Multi tenancy in Pinot 
2.0](https://github.com/linkedin/pinot/wiki/Multitenancy#multi-tenancy-in-pinot-20)
 for more info on how Multi tenancy is solved in Pinot 2.0.
 
 #### Broker Node
 
-The responsibility of Broker is to route a given query to appropriate Pinot 
Server instances, collect the responses and merge the responses into final 
result and send it back to the client. The two main steps involved in this are
+The responsibility of Broker is to route a given query to appropriate Pinot 
Server instances, collect and merge the responses into final result and send it 
back to the client. The two main steps involved in this are
 
-**service discovery**: Service discovery is the mechanism of knowing what 
Tables are hosted in the cluster and location of the Table Segments and the 
time range for each segment. As explained in the previous section, this is 
derived from the information stored in Zookeeper via Helix library. Broker uses 
this information not only to compute the subset of the nodes to send the 
request to, it prunes the number of segments to be queries. This is achieved by 
looking at the time range in the q [...]
+**service discovery**: Service discovery is the mechanism of knowing what 
Tables are hosted in the cluster and location of the Table Segments and the 
time range for each segment. As explained in the previous section, this is 
derived from the information stored in Zookeeper via Helix library. Broker uses 
this information not only to compute the subset of the nodes to send the 
request to, but also prunes the number of segments to be queries. This is 
achieved by looking at the time range in [...]
 
-**Scatter gather:** Once the broker computes the set of nodes to route the 
query, the requests are routed to the respective Pinot Server nodes. Each 
server nodes processes the query and returns the response, broker will merge 
the responses from individual Server and returns the response to the client. 
The merging logic is dependent on the query selection with Limit, aggregation, 
group by top K etc. If any of the servers fails to process the query or Time's 
out, broker will return partial [...]
+**Scatter gather:** Once the broker computes the set of nodes to route the 
query, the requests are routed to the respective Pinot Server nodes. Each 
server node processes the query and returns the response. Broker will merge the 
responses from individual Server and returns the response back to the client. 
The merging logic is dependent on the query selection with limit, aggregation, 
group by top K etc. If any of the servers fails to process the query or Time's 
out, broker will return par [...]
 
 ### Pinot Index Segment
 
 [[image2015-5-17 17-59-10.png]]
 
-Pinot Index Segment is the columnar representation of the raw data. The raw 
data is generally represented in a row oriented format which can be AVRO, JSON, 
CSV etc. Converting row oriented format into columnar can reduce storage space 
and allow fast scan for specific columns. Row oriented format is efficient when 
query is either updating or reading a specific row in the data. This is 
typically the case with OLTP use cases where relational databases such as 
Oracle, MySQL etc is used. Colu [...]
+Pinot Index Segment is the columnar representation of the raw data. The raw 
data is generally represented in a row oriented format which can be AVRO, JSON, 
CSV etc. Converting row oriented format into columnar can reduce storage space 
and allow fast scan for specific columns. Row oriented format is efficient when 
query is either updating or reading a specific row in the data. This is 
typically the case with OLTP use cases where relational databases such as 
Oracle, MySQL etc is used. Colu [...]
 
-Columnar format also provides storage related benefits. Some of the columns 
contains values that are repetitive. For e.g. if the column is of type country 
then storing in row oriented format will require space proportional 
varchar(100) * number of rows. Columnar format can apply various encoding such 
as Fixed Bit and on top of that, apply compression algorithm to compress the 
data further. While these techniques can be applied for row oriented storage as 
well, the encoding and compressio [...]
+Columnar format also provides storage related benefits. Some of the columns 
contain values that are repetitive. For e.g. if the column is of type country 
then storing in row oriented format will require space proportional 
varchar(100) * number of rows. Columnar format can apply various encoding such 
as Fixed Bit and on top of that, applying compression algorithm to compress the 
data further. While these techniques can be applied for row oriented storage as 
well, the encoding and compress [...]
 
-Columnar formats definitely have their down sides, creating efficient columnar 
formats typically take time and once created they cannot be mutated easily. 
While this might a problem for OLTP workloads, typical OLAP use cases consists 
of TimeSeries data which is immutable.
+Columnar formats definitely have their down sides, creating efficient columnar 
formats typically take time and once created they cannot be mutated easily. 
While this might be a problem for OLTP workloads, typical OLAP use cases 
consists of TimeSeries data which is immutable.
 
 #### Anatomy of Index Segment
 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[incubator-pinot.wiki] branch master updated: Updated Architecture (markdown)

Reply via email to