This is an automated email from the ASF dual-hosted git repository.
mcvsubbu pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-pinot.git
The following commit(s) were added to refs/heads/master by this push:
new e4bb117 ReadTheDocs documentation for Table Configs, Monitoring, and
Deployment (#3975)
e4bb117 is described below
commit e4bb11735f4a1985ccade7146f9516bac8d63dbe
Author: jgutmann <[email protected]>
AuthorDate: Wed Mar 27 15:52:38 2019 -0700
ReadTheDocs documentation for Table Configs, Monitoring, and Deployment
(#3975)
* First pass of 'Running Pinot in Production' docs
First pass of 'Running Pinot in Production' docs, which includes
sections on
* Deploying Pinot and the ordering
* Table configs and what each setting does
* Monitoring (still needs proper metric names added)
* Second pass at my Apache doc items
Changes include:
1. Breaking out the table config section from the "Running Pinot in
Production" section. I think this makes the doc easier to follow for
table configs. I have a section here which includes a sample offline and
realtime table config. I am not sure if there are some configs included
as a way for users to run a local quick start. If there is I can link to
those files in the repo, otherwise I can include the full configs on
this page for users.
2. Adding links to metrics which I am able to find in the code. I need
to collect feedback on what to do with the others - Remove or I need
pointers to where I can find them in the code.
* Updating metrics with names and URLs
The is two items I need before this is completed
1. I notice that the controller metrics appear to be generated dynamically
due to table names being present in most of the metrics. I need some advice
from the devs on how I should note these metric names so I can link to them
in the code. I want to make sure it is clear to the users.
2. I want to add sample table configs in the "in_production.rst"
section. I feel this would be best to be added in the git repo since
they will be useful for the users in the quick start guides. I need Jack
to tell me where in the repo I should put these files. Once they are in
the repo I will link to them from that section.
* Updates based on Dino's feedback
General cleanup and re-formatting based on feedback for my PR.
* A few minor changes
* Removing line numbers from metrics URLs
* Updating documentation based on PR feedback
* Modifications based on PR feedback
* Updating based on PR feedback
* Adding required table config sections
* Addressing PR comments
* Removing some un-needed headers
* Updating with more feedback from Subbu
---
docs/admin_guide.rst | 1 +
docs/in_production.rst | 122 +++++++++++++++++++++-----
docs/tableconfig_schema.rst | 209 ++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 309 insertions(+), 23 deletions(-)
diff --git a/docs/admin_guide.rst b/docs/admin_guide.rst
index 51c5ec0..c69ada6 100644
--- a/docs/admin_guide.rst
+++ b/docs/admin_guide.rst
@@ -26,6 +26,7 @@ Admin Guide
.. toctree::
:maxdepth: 1
+ tableconfig_schema
in_production
management_api
pinot_hadoop
diff --git a/docs/in_production.rst b/docs/in_production.rst
index 520137c..9201411 100644
--- a/docs/in_production.rst
+++ b/docs/in_production.rst
@@ -17,51 +17,127 @@
.. under the License.
..
-Running Pinot in production
+Running Pinot in Production
===========================
-Installing Pinot
-----------------
-
Requirements
~~~~~~~~~~~~
-* Java 8+
-* Several nodes with enough memory
-* A working installation of Zookeeper
+You will need the following in order to run pinot in production:
-Recommended environment
-~~~~~~~~~~~~~~~~~~~~~~~
-
-* Shared storage infrastructure (such as NFS)
-* Regular Zookeeper backups
-* HTTP load balancers (such as nginx/haproxy)
+* Hardware for controller/broker/servers as per your load
+* Working installation of Zookeeper that Pinot can use. We recommend setting
aside a path within zookpeer and including that path in pinot.controller.zkStr.
Pinot will create its own cluster under this path (cluster name decided by
pinot.controller.helixClusterName)
+* Shared storage mounted on controllers (if you plan to have multiple
controllers for the same cluster). Alternatively, an implementation of PinotFS
that the Pinot hosts have access to.
+* HTTP load balancers for spraying queries across brokers (or other mechanism
to balance queries)
+* HTTP load balancers for spraying controller requests (e.g. segment push, or
other controller APIs) or other mechanisms for distribution of these requests.
Deploying Pinot
----------------
+~~~~~~~~~~~~~~~
+
+In general, when deploying Pinot services, it is best to adhere to a specific
ordering in which the various components should be deployed. This deployment
order is recommended in case of the scenario that there might be protocol or
other significant differences, the deployments go out in a predictable order in
which failure due to these changes can be avoided.
-Direct deployment of Pinot
-~~~~~~~~~~~~~~~~~~~~~~~~~~
+The ordering is as follows:
-Deployment of Pinot on Kubernetes
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+#. pinot-controller
+#. pinot-broker
+#. pinot-server
+#. pinot-minion
Managing Pinot
---------------
+~~~~~~~~~~~~~~
Creating tables
-~~~~~~~~~~~~~~~
+---------------
Updating tables
-~~~~~~~~~~~~~~~
+---------------
Uploading data
-~~~~~~~~~~~~~~
+--------------
Configuring realtime data ingestion
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+-----------------------------------
Monitoring Pinot
~~~~~~~~~~~~~~~~
+Pinot exposes several metrics to monitor the service and ensure that pinot
users are not experiencing issues. In this section we discuss some of the key
metrics that are useful to monitor. A full list of metrics is available in the
`Metrics <customizations.html#metrics>`_ section.
+
+Pinot Server
+------------
+
+* Missing Segments - `NUM_MISSING_SEGMENTS
<https://github.com/apache/incubator-pinot/blob/master/pinot-common/src/main/java/org/apache/pinot/common/metrics/ServerMeter.java>`_
+
+ * Number of missing segments that the broker queried for (expected to be on
the server) but the server didn't have. This can be due to retention or stale
routing table.
+
+* Query latency - `TOTAL_QUERY_TIME
<https://github.com/apache/incubator-pinot/blob/ce2d9ee9dc73b2d7273a63a4eede774eb024ea8f/pinot-common/src/main/java/org/apache/pinot/common/metrics/ServerQueryPhase.java>`_
+
+ * The number of exception which might have occurred during query execution
+
+* Query Execution Exceptions - `QUERY_EXECUTION_EXCEPTIONS
<https://github.com/apache/incubator-pinot/blob/master/pinot-common/src/main/java/org/apache/pinot/common/metrics/ServerMeter.java>`_
+
+ * The number of exception which might have occurred during query execution
+
+* Realtime Consumption Status - `LLC_PARTITION_CONSUMING
<https://github.com/apache/incubator-pinot/blob/master/pinot-common/src/main/java/org/apache/pinot/common/metrics/ServerGauge.java>`_
+
+ * This gives a binary value based on whether low-level consumption is
healthy (1) or unhealthy (0). It's important to ensure at least a single
replica of each partition is consuming
+
+* Realtime Highest Offset Consumed - `HIGHEST_STREAM_OFFSET_CONSUMED
<https://github.com/apache/incubator-pinot/blob/master/pinot-common/src/main/java/org/apache/pinot/common/metrics/ServerGauge.java>`_
+
+ * The highest offset which has been consumed so far.
+
+Pinot Broker
+------------
+
+* Incoming QPS (per broker) - `QUERIES
<https://github.com/apache/incubator-pinot/blob/master/pinot-common/src/main/java/org/apache/pinot/common/metrics/BrokerMeter.java>`_
+
+ * The rate which an individual broker is receiving queries. Units are in QPS.
+
+* Dropped Requests - `REQUEST_DROPPED_DUE_TO_SEND_ERROR
<https://github.com/apache/incubator-pinot/blob/master/pinot-common/src/main/java/org/apache/pinot/common/metrics/BrokerMeter.java>`_,
`REQUEST_DROPPED_DUE_TO_CONNECTION_ERROR
<https://github.com/apache/incubator-pinot/blob/master/pinot-common/src/main/java/org/apache/pinot/common/metrics/BrokerMeter.java>`_,
`REQUEST_DROPPED_DUE_TO_ACCESS_ERROR
<https://github.com/apache/incubator-pinot/blob/master/pinot-common/src/main/java/org/ap
[...]
+
+ * These multiple metrics will indicate if a query is dropped, ie the
processing of that query has been forfeited for some reason.
+
+* Partial Responses - `BROKER_RESPONSES_WITH_PARTIAL_SERVERS_RESPONDED
<https://github.com/apache/incubator-pinot/blob/master/pinot-common/src/main/java/org/apache/pinot/common/metrics/BrokerMeter.java>`_
+
+ * Indicates a count of partial responses. A partial response is when at
least 1 of the requested servers fails to respond to the query.
+
+* Table QPS quota exceeded - `QUERY_QUOTA_EXCEEDED
<https://github.com/apache/incubator-pinot/blob/master/pinot-common/src/main/java/org/apache/pinot/common/metrics/BrokerMeter.java>`_
+
+ * Binary metric which will indicate when the configured QPS quota for a
table is exceeded (1) or if there is capacity remaining (0).
+
+* Table QPS quota usage percent - `QUERY_QUOTA_CAPACITY_UTILIZATION_RATE
<https://github.com/apache/incubator-pinot/blob/master/pinot-common/src/main/java/org/apache/pinot/common/metrics/BrokerGauge.java>`_
+
+ * Percentage of the configured QPS quota being utilized.
+
+Pinot Controller
+----------------
+
+Many of the controller metrics include a table name and thus are dynamically
generated in the code. The metrics below point to the classes which generate
the corresponding metrics.
+
+To get the real metric name, the easiest route is to spin up a controller
instance, create a table with the desired name and look through the generated
metrics.
+
+.. todo::
+
+ Give a more detailed explanation of how metrics are generated, how to
identify real metrics names and where to find them in the code.
+
+* Percent Segments Available - `PERCENT_SEGMENTS_AVAILABLE
<https://github.com/apache/incubator-pinot/blob/ce2d9ee9dc73b2d7273a63a4eede774eb024ea8f/pinot-common/src/main/java/org/apache/pinot/common/metrics/ControllerGauge.java>`_
+
+ * Percentage of complete online replicas in external view as compared to
replicas in ideal state.
+
+* Segments in Error State - `SEGMENTS_IN_ERROR_STATE
<https://github.com/apache/incubator-pinot/blob/ce2d9ee9dc73b2d7273a63a4eede774eb024ea8f/pinot-common/src/main/java/org/apache/pinot/common/metrics/ControllerGauge.java>`_
+
+ * Number of segments in an ``ERROR`` state for a given table.
+
+* Last push delay - Generated in the `ValidationMetrics
<https://github.com/apache/incubator-pinot/blob/ce2d9ee9dc73b2d7273a63a4eede774eb024ea8f/pinot-common/src/main/java/org/apache/pinot/common/metrics/ValidationMetrics.java>`_
class.
+
+ * The time in hours since the last time an offline segment has been pushed
to the controller.
+
+* Percent of replicas up - `PERCENT_OF_REPLICAS
<https://github.com/apache/incubator-pinot/blob/master/pinot-common/src/main/java/org/apache/pinot/common/metrics/ControllerGauge.java>`_
+
+ * Percentage of complete online replicas in external view as compared to
replicas in ideal state.
+
+* Table storage quota usage percent - `TABLE_STORAGE_QUOTA_UTILIZATION
<https://github.com/apache/incubator-pinot/blob/master/pinot-common/src/main/java/org/apache/pinot/common/metrics/ControllerGauge.java>`_
+
+ * Shows how much of the table's storage quota is currently being used,
metric will a percentage of a the entire quota.
+
diff --git a/docs/tableconfig_schema.rst b/docs/tableconfig_schema.rst
new file mode 100644
index 0000000..202564f
--- /dev/null
+++ b/docs/tableconfig_schema.rst
@@ -0,0 +1,209 @@
+..
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements. See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership. The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License. You may obtain a copy of the License at
+..
+.. http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing,
+.. software distributed under the License is distributed on an
+.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+.. KIND, either express or implied. See the License for the
+.. specific language governing permissions and limitations
+.. under the License.
+..
+
+Table Config
+============
+
+Sample table config and descriptions
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+A sample table config is shown below which has sub-sections collasped. The
sub-sections will be described individually in following sections. Further
links to feature specific documention will be included where available.
+
+``tableName`` - Should only contain alpha-numeric characters, hyphens ('-'),
or underscores ('_'). Though using a double-underscore ('__') is not allowed
and reserved for other features within Pinot.
+
+``tableType`` - Indicates the type of the table. There are some settings
specific to each type. This will be clarified below as each sub-section is
explained.
+
+ * Allowed values:
+
+ * ``OFFLINE`` - An offline table is used to host data which might be
periodically uploaded - daily, weekly, etc. More information on `Offline Tables
<architecture.html#ingesting-offline-data>`_
+ * ``REALTIME`` - A realtime table is used to consume data from incoming
data streams and serve this data in a near-realtime manner. More information on
`Realtime Tables <architecture.html#ingesting-realtime-data>`_
+
+.. code-block:: none
+
+ {
+ "tableName": "myPinotTable",
+ "tableType": "REALTIME"
+ "segmentsConfig": {...},
+ "tableIndexConfig": {...},
+ "tenants": {...},
+ "routing": {...},
+ "task": {...},
+ "metadata": {...}
+ }
+
+Some sections are required, otherwise the table config will be rejected by
pinot-controller. The required sections are:
+
+* ``tableName``
+* ``tabletype``
+* ``"segmentsConfig": {...}``
+* ``"tableIndexConfig": {...}``
+* ``"tenants": {...}``
+* ``"metadata": {...}``
+
+Segments Config Section
+~~~~~~~~~~~~~~~~~~~~~~~
+
+The ``segmentsConfig`` section has information about configuring the following:
+
+* Segment Retention - with the ``retentionTimeUnit`` and
``retentionTimeValue`` options. Retention is only applicable to tables of type
``APPEND``.
+
+ * Allowed values:
+
+ * ``retentionTimeUnit`` - ``DAYS``
+ * ``retentionTimeValue`` - Positive integers
+
+* ``segmentPushFrequency`` - to indicate how frequently segments are uploaded.
+
+ * Allowed values - ``daily``, ``hourly``
+
+* ``segmentPushType`` - Indicates the type of push to the table.
+
+ * Allowed values:
+
+ * ``APPEND`` means new data will be pushed and appended to the current
data in the table, all realtime tables *must* be explicity set to ``APPEND``.
+ * ``REFRESH`` will refresh the entire dataset contained within the table.
Segment retention is ignored when set to ``REFRESH``.
+
+* ``replication`` - Number of replicas of data in a table, used for offline
tables only.
+
+ * Allowed values - Positive integers
+
+* ``replicasPerPartition`` - Number of of data in a table, used for realtime
LLC tables only.
+
+ * Allowed values - Positive integers
+
+* Time column - using ``timeColumnName`` and ``timeType``, this must match
what's configured in the preceeding schema
+
+ * Allowed values - String, this must match the ``timeFieldSpec`` section in
the schema
+
+* Segment assignment strategy - Described more on the page `Customizing Pinot
<customizations.html#segment-assignment-strategies>`_
+
+
+.. code-block:: none
+
+ "segmentsConfig": {
+ "retentionTimeUnit": "DAYS",
+ "retentionTimeValue": "5",
+ "segmentPushFrequency": "daily",
+ "segmentPushType": "APPEND",
+ "replication": "3",
+ "replicasPerPartition": "3",
+ "schemaName": "myPinotSchmea",
+ "timeColumnName": "daysSinceEpoch",
+ "timeType": "DAYS",
+ "segmentAssignmentStrategy": "BalanceNumSegmentAssignmentStrategy"
+ },
+
+Table Index Config Section
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The ``tableIndexConfig`` section has information about how to configure:
+
+* ``invertedIndexColumns`` - Indicates a list of real column names as
specified in the schema to create inverted indexes for. More info on indexes
can be found on the `Index Techniques <index_techniques.html>`_ page.
+
+ * Allowed values - String; string must match the column name in the
corresponding schema
+
+* ``noDictionaryColumns`` - Indicates a list of real column names as specified
in the schema. Column names present will **not** have a dictionary created.
More info on indexes can be found on the `Index Techniques
<index_techniques.html>`_ page.
+
+ * Allowed values - String; string must match the column name in the
corresponding schema
+
+* ``sortedColumn`` - Indicates a list of real column names as specified in the
schema. Data should be sorted based on the column names provided. This field
needs to be set only for realtime tables. For offline, if the data at source is
sorted, we will create a sorted index automatically. More info on indexes can
be found on the `Index Techniques <index_techniques.html>`_ page.
+
+ * Allowed values - String; string must match the column name in the
corresponding schema
+
+* ``aggregateMetrics`` - Switch for the aggregate metrics feature. This
feature will aggregate realtime stream data as it is consumed, where
applicable, in order to reduce segment sizes. We sum the metric column values
of all rows that have the same value for dimension columns and create one row
in a realtime segment for all such rows. This feature is only available on
REALTIME tables.
+
+ * Allowed values - ``true`` to enable, ``false`` to disable.
+
+.. todo::
+
+ Create a separate section to describe this feature and design, then link to
it from this config description
+
+* ``segmentPartitionConfig`` - Cofigures the Data Partitioning Strategy.
Further documentation on this feather available in the `Data Partitioning
Strategies <customizations.html#data-partitioning-strategies>`_ section.
+* ``loadMode`` - indicates how data will be loaded on pinot-server. either
``"MMAP"`` or ``"HEAP"`` can be configured.
+
+ * Allowed values:
+
+ * ``MMAP`` - Configures pinot-server to load data segments to off-heap
memory.
+ * ``HEAP`` - Configures pinot-server to load data directly into direct
memory.
+
+* ``streamConfigs`` - This section is where the bulk of the settings specific
to only REALTIME tables are found. These options are explained in detail in the
`Pluggable Streams <pluggable_streams.html#pluggable-streams>`_ page.
+
+.. code-block:: none
+
+ "tableIndexConfig": {
+ "invertedIndexColumns": [],
+ "noDictionaryColumns" : [],
+ "sortedColumn": [
+ "nameOfSortedColumn"
+ ],
+ "noDictionaryColumns": [
+ "nameOfNoDictionaryColumn"
+ ],
+ "aggregateMetrics": "true",
+ "segmentPartitionConfig": {
+ "columnPartitionMap": {
+ "contentId": {
+ "functionName": "murmur",
+ "numPartitions": 32
+ }
+ }
+ },
+ "loadMode": "MMAP",
+ "lazyLoad": "false",
+ "segmentFormatVersion": "v3",
+ "streamConfigs": {}
+ },
+
+Tenants Section
+~~~~~~~~~~~~~~~
+
+The ``tenants`` section has two main config fields in it. These fields are
used to configure which tenants are used within Helix.
+
+.. code-block:: none
+
+ "tenants": {
+ "broker": "brokerTenant",
+ "server": "serverTenant"
+ },
+
+Routing Section
+~~~~~~~~~~~~~~~
+
+The ``routing`` section contains configurations on how which
routingTableBuilder will be used and to pass options specific to that builder.
There is more information in the `Routing Strategies
<customizations.html#routing-strategies>`_ section.
+
+.. code-block:: none
+
+ "routing": {
+ "routingTableBuilderName": "PartitionAwareRealtime",
+ "routingTableBuilderOptions": {}
+ },
+
+Metadata Section
+~~~~~~~~~~~~~~~~
+
+The ``metadata`` section is used for passing special key-value pairs into
Pinot which will be stored with the table config inside of Pinot. There's more
info in the `Custom Configs <customizations.html#custom-configs>`_ section.
+
+.. code-block:: none
+
+ "metadata": {
+ "customConfigs": {
+ "specialConfig": "testValue",
+ "anotherSpecialConfig": "value"
+ }
+ }
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]