This is an automated email from the ASF dual-hosted git repository. mcvsubbu pushed a commit to branch 0.2.0 in repository https://gitbox.apache.org/repos/asf/incubator-pinot.git
commit 43214a82dbcd6f53bb4d1e5143eaf5016523b990 Author: Subbu Subramaniam <[email protected]> AuthorDate: Mon Nov 4 14:22:23 2019 -0800 Manual merge of 49a65a3987a78fce2a0851a65eb65762b4f3003d for docs The original commit included a doc file and source files that were not related to the documenation change. Doing a manual merge of the documentation file alone --- docs/architecture.rst | 22 ++++++++++++---------- 1 file changed, 12 insertions(+), 10 deletions(-) diff --git a/docs/architecture.rst b/docs/architecture.rst index 7c285ca..c7fc486 100644 --- a/docs/architecture.rst +++ b/docs/architecture.rst @@ -84,7 +84,7 @@ Pinot Tables ------------ Pinot supports realtime, or offline, or hybrid tables. Data in Pinot tables is contained in the segments -belonging to that table. A Pinot table is modeled as a Helix resource. Each segment of a table is modeled as a Helix Partition, +belonging to that table. A Pinot table is modeled as a Helix resource. Each segment of a table is modeled as a Helix Partition. Table Schema defines column names and their metadata. Table configuration and schema is stored in zookeeper. @@ -109,7 +109,7 @@ Ingesting Offline data Segments for offline tables are constructed outside of Pinot, typically in Hadoop via map-reduce jobs and ingested into Pinot via REST API provided by the Controller. Pinot provides libraries to create Pinot segments out of input files in AVRO, JSON or CSV formats in a hadoop job, and push -the constructed segments to the controlers via REST APIs. +the constructed segments to the controllers via REST APIs. When an Offline segment is ingested, the controller looks up the table's configuration and assigns the segment to the servers that host the table. It may assign multiple servers for each segment depending on the number of replicas @@ -126,7 +126,7 @@ start include the new segments for queries. Brokers support different routing strategies depending on the type of table, the segment assignment strategy and the use case. -Data in offline segments are immmutable (Rows cannot be added, deleted, or modified). However, segments may be replaced with modified data. +Data in offline segments are immutable (Rows cannot be added, deleted, or modified). However, segments may be replaced with modified data. .. _ingesting-realtime-data: @@ -143,17 +143,17 @@ including all partitions (or, just from one partition). A pinot table can be configured to consume from streams in one of two modes: * ``LowLevel``: This is the preferred mode of consumption. Pinot creates independent partition-level consumers for - each partition. Depending on the the configured number of replicas, multiple consumers may be created for + each partition. Depending on the the configured number of replicas, multiple consumers may be created for each partition, taking care that no two replicas exist on the same server host. Therefore you need to provision *at least* as many hosts as the number of replcias configured. * ``HighLevel``: Pinot creates *one* stream-level consumer that consumes from all partitions. Each message consumed could be from any of the partitions of the stream. Depending on the configured number of replicas, multiple - stream-level consumers are created, taking care that no two replicas exist on the same server host. Therefore + stream-level consumers are created, taking care that no two replicas exist on the same server host. Therefore you need to provision exactly as many hosts as the number of replicas configured. Of course, the underlying stream should support either mode of consumption in order for a Pinot table to use that -mode. Kafka has support for both of these modes. See :ref:`pluggable-streams` for more information on support of other +mode. Kafka has support for both of these modes. See :ref:`pluggable-streams` for more information on support of other data streams in Pinot. In either mode, Pinot servers store the ingested rows in volatile memory until either one of the following conditions are met: @@ -179,7 +179,7 @@ easy and automated mechanisms for replacing pinot servers, or expanding capacity that ensure that the completed segment is equivalent across all replicas. In ``HighLevel`` mode, the servers persist the consumed rows into local store (and **not** the segment store). Since consumption of rows -can be from any partition, it is not possible to guarantee equivalence of segments across replicas. +can be from any partition, it is not possible to guarantee equivalence of segments across replicas. See `Consuming and Indexing rows in Realtime <https://cwiki.apache.org/confluence/display/PINOT/Consuming+and+Indexing+rows+in+Realtime>`_ for details. @@ -187,10 +187,12 @@ See `Consuming and Indexing rows in Realtime <https://cwiki.apache.org/confluenc Pinot Segments -------------- -A segment is laid out in a columnar format -so that it can be directly mapped into memory for serving queries. Columns may be single or multi-valued. Column types may be +A segment is laid out in a columnar format so that it can be directly mapped into memory for serving queries. + +Columns may be single or multi-valued. Column types may be STRING, INT, LONG, FLOAT, DOUBLE or BYTES. Columns may be declared to be metric or dimension (or specifically as a time dimension) -in the schema. +in the schema. Columns can have default null value. For example, the default null value of a integer column can be 0. +Note: The default value of byte column has to be hex-encoded before adding to the schema. Pinot uses dictionary encoding to store values as a dictionary ID. Columns may be configured to be "no-dictionary" column in which case raw values are stored. Dictionary IDs are encoded using minimum number of bits for efficient storage (*e.g.* a column with cardinality --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
