This is an automated email from the ASF dual-hosted git repository.

mcvsubbu pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-pinot.git


The following commit(s) were added to refs/heads/master by this push:
     new e4bb117  ReadTheDocs documentation for Table Configs, Monitoring, and 
Deployment (#3975)
e4bb117 is described below

commit e4bb11735f4a1985ccade7146f9516bac8d63dbe
Author: jgutmann <[email protected]>
AuthorDate: Wed Mar 27 15:52:38 2019 -0700

    ReadTheDocs documentation for Table Configs, Monitoring, and Deployment 
(#3975)
    
    * First pass of 'Running Pinot in Production' docs
    
    First pass of 'Running Pinot in Production' docs, which includes
    sections on
    
    * Deploying Pinot and the ordering
    * Table configs and what each setting does
    * Monitoring (still needs proper metric names added)
    
    * Second pass at my Apache doc items
    
    Changes include:
    
    1. Breaking out the table config section from the "Running Pinot in
    Production" section. I think this makes the doc easier to follow for
    table configs. I have a section here which includes a sample offline and
    realtime table config. I am not sure if there are some configs included
    as a way for users to run a local quick start. If there is I can link to
    those files in the repo, otherwise I can include the full configs on
    this page for users.
    
    2. Adding links to metrics which I am able to find in the code. I need
    to collect feedback on what to do with the others - Remove or I need
    pointers to where I can find them in the code.
    
    * Updating metrics with names and URLs
    
    The is two items I need before this is completed
    
    1. I notice that the controller metrics appear to be generated dynamically
    due to table names being present in most of the metrics. I need some advice
    from the devs on how I should note these metric names so I can link to them
    in the code. I want to make sure it is clear to the users.
    
    2. I want to add sample table configs in the "in_production.rst"
    section. I feel this would be best to be added in the git repo since
    they will be useful for the users in the quick start guides. I need Jack
    to tell me where in the repo I should put these files. Once they are in
    the repo I will link to them from that section.
    
    * Updates based on Dino's feedback
    
    General cleanup and re-formatting based on feedback for my PR.
    
    * A few minor changes
    
    * Removing line numbers from metrics URLs
    
    * Updating documentation based on PR feedback
    
    * Modifications based on PR feedback
    
    * Updating based on PR feedback
    
    * Adding required table config sections
    
    * Addressing PR comments
    
    * Removing some un-needed headers
    
    * Updating with more feedback from Subbu
---
 docs/admin_guide.rst        |   1 +
 docs/in_production.rst      | 122 +++++++++++++++++++++-----
 docs/tableconfig_schema.rst | 209 ++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 309 insertions(+), 23 deletions(-)

diff --git a/docs/admin_guide.rst b/docs/admin_guide.rst
index 51c5ec0..c69ada6 100644
--- a/docs/admin_guide.rst
+++ b/docs/admin_guide.rst
@@ -26,6 +26,7 @@ Admin Guide
 .. toctree::
    :maxdepth: 1
 
+   tableconfig_schema
    in_production
    management_api
    pinot_hadoop
diff --git a/docs/in_production.rst b/docs/in_production.rst
index 520137c..9201411 100644
--- a/docs/in_production.rst
+++ b/docs/in_production.rst
@@ -17,51 +17,127 @@
 .. under the License.
 ..
 
-Running Pinot in production
+Running Pinot in Production
 ===========================
 
-Installing Pinot
-----------------
-
 Requirements
 ~~~~~~~~~~~~
 
-* Java 8+
-* Several nodes with enough memory
-* A working installation of Zookeeper
+You will need the following in order to run pinot in production:
 
-Recommended environment
-~~~~~~~~~~~~~~~~~~~~~~~
-
-* Shared storage infrastructure (such as NFS)
-* Regular Zookeeper backups
-* HTTP load balancers (such as nginx/haproxy)
+* Hardware for controller/broker/servers as per your load
+* Working installation of Zookeeper that Pinot can use. We recommend setting 
aside a path within zookpeer and including that path in pinot.controller.zkStr. 
Pinot will create its own cluster under this path (cluster name decided by 
pinot.controller.helixClusterName)
+* Shared storage mounted on controllers (if you plan to have multiple 
controllers for the same cluster). Alternatively, an implementation of PinotFS 
that the Pinot hosts have access to.
+* HTTP load balancers for spraying queries across brokers (or other mechanism 
to balance queries)
+* HTTP load balancers for spraying controller requests (e.g. segment push, or 
other controller APIs) or other mechanisms for distribution of these requests.
 
 Deploying Pinot
----------------
+~~~~~~~~~~~~~~~
+
+In general, when deploying Pinot services, it is best to adhere to a specific 
ordering in which the various components should be deployed. This deployment 
order is recommended in case of the scenario that there might be protocol or 
other significant differences, the deployments go out in a predictable order in 
which failure  due to these changes can be avoided.
 
-Direct deployment of Pinot
-~~~~~~~~~~~~~~~~~~~~~~~~~~
+The ordering is as follows:
 
-Deployment of Pinot on Kubernetes
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+#. pinot-controller
+#. pinot-broker
+#. pinot-server
+#. pinot-minion
 
 Managing Pinot
---------------
+~~~~~~~~~~~~~~
 
 Creating tables
-~~~~~~~~~~~~~~~
+---------------
 
 Updating tables
-~~~~~~~~~~~~~~~
+---------------
 
 Uploading data
-~~~~~~~~~~~~~~
+--------------
 
 Configuring realtime data ingestion
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+-----------------------------------
 
 Monitoring Pinot
 ~~~~~~~~~~~~~~~~
 
+Pinot exposes several metrics to monitor the service and ensure that pinot 
users are not experiencing issues. In this section we discuss some of the key 
metrics that are useful to monitor. A full list of metrics is available in the 
`Metrics <customizations.html#metrics>`_ section.
+
+Pinot Server
+------------
+
+* Missing Segments - `NUM_MISSING_SEGMENTS 
<https://github.com/apache/incubator-pinot/blob/master/pinot-common/src/main/java/org/apache/pinot/common/metrics/ServerMeter.java>`_
+
+  * Number of missing segments that the broker queried for (expected to be on 
the server) but the server didn't have. This can be due to retention or stale 
routing table.
+
+* Query latency - `TOTAL_QUERY_TIME 
<https://github.com/apache/incubator-pinot/blob/ce2d9ee9dc73b2d7273a63a4eede774eb024ea8f/pinot-common/src/main/java/org/apache/pinot/common/metrics/ServerQueryPhase.java>`_
+
+  * The number of exception which might have occurred during query execution
+
+* Query Execution Exceptions - `QUERY_EXECUTION_EXCEPTIONS 
<https://github.com/apache/incubator-pinot/blob/master/pinot-common/src/main/java/org/apache/pinot/common/metrics/ServerMeter.java>`_
+
+  * The number of exception which might have occurred during query execution
+
+* Realtime Consumption Status - `LLC_PARTITION_CONSUMING 
<https://github.com/apache/incubator-pinot/blob/master/pinot-common/src/main/java/org/apache/pinot/common/metrics/ServerGauge.java>`_
+
+  * This gives a binary value based on whether low-level consumption is 
healthy (1) or unhealthy (0). It's important to ensure at least a single 
replica of each partition is consuming
+
+* Realtime Highest Offset Consumed - `HIGHEST_STREAM_OFFSET_CONSUMED 
<https://github.com/apache/incubator-pinot/blob/master/pinot-common/src/main/java/org/apache/pinot/common/metrics/ServerGauge.java>`_
+
+  * The highest offset which has been consumed so far.
+
+Pinot Broker
+------------
+
+* Incoming QPS (per broker) - `QUERIES 
<https://github.com/apache/incubator-pinot/blob/master/pinot-common/src/main/java/org/apache/pinot/common/metrics/BrokerMeter.java>`_
+
+  * The rate which an individual broker is receiving queries. Units are in QPS.
+
+* Dropped Requests - `REQUEST_DROPPED_DUE_TO_SEND_ERROR 
<https://github.com/apache/incubator-pinot/blob/master/pinot-common/src/main/java/org/apache/pinot/common/metrics/BrokerMeter.java>`_,
 `REQUEST_DROPPED_DUE_TO_CONNECTION_ERROR 
<https://github.com/apache/incubator-pinot/blob/master/pinot-common/src/main/java/org/apache/pinot/common/metrics/BrokerMeter.java>`_,
 `REQUEST_DROPPED_DUE_TO_ACCESS_ERROR 
<https://github.com/apache/incubator-pinot/blob/master/pinot-common/src/main/java/org/ap
 [...]
+
+  * These multiple metrics will indicate if a query is dropped, ie the 
processing of that query has been forfeited for some reason.
+
+* Partial Responses - `BROKER_RESPONSES_WITH_PARTIAL_SERVERS_RESPONDED 
<https://github.com/apache/incubator-pinot/blob/master/pinot-common/src/main/java/org/apache/pinot/common/metrics/BrokerMeter.java>`_
+
+  * Indicates a count of partial responses. A partial response is when at 
least 1 of the requested servers fails to respond to the query.
+
+* Table QPS quota exceeded - `QUERY_QUOTA_EXCEEDED 
<https://github.com/apache/incubator-pinot/blob/master/pinot-common/src/main/java/org/apache/pinot/common/metrics/BrokerMeter.java>`_
+
+  * Binary metric which will indicate when the configured QPS quota for a 
table is exceeded (1) or if there is capacity remaining (0).
+
+* Table QPS quota usage percent - `QUERY_QUOTA_CAPACITY_UTILIZATION_RATE 
<https://github.com/apache/incubator-pinot/blob/master/pinot-common/src/main/java/org/apache/pinot/common/metrics/BrokerGauge.java>`_
+
+  * Percentage of the configured QPS quota being utilized.
+
+Pinot Controller
+----------------
+
+Many of the controller metrics include a table name and thus are dynamically 
generated in the code. The metrics below point to the classes which generate 
the corresponding metrics.
+
+To get the real metric name, the easiest route is to spin up a controller 
instance, create a table with the desired name and look through the generated 
metrics.
+
+.. todo::
+
+  Give a more detailed explanation of how metrics are generated, how to 
identify real metrics names and where to find them in the code.
+
+* Percent Segments Available - `PERCENT_SEGMENTS_AVAILABLE 
<https://github.com/apache/incubator-pinot/blob/ce2d9ee9dc73b2d7273a63a4eede774eb024ea8f/pinot-common/src/main/java/org/apache/pinot/common/metrics/ControllerGauge.java>`_
+
+  * Percentage of complete online replicas in external view as compared to 
replicas in ideal state.
+
+* Segments in Error State - `SEGMENTS_IN_ERROR_STATE 
<https://github.com/apache/incubator-pinot/blob/ce2d9ee9dc73b2d7273a63a4eede774eb024ea8f/pinot-common/src/main/java/org/apache/pinot/common/metrics/ControllerGauge.java>`_
+
+  * Number of segments in an ``ERROR`` state for a given table.
+
+* Last push delay - Generated in the `ValidationMetrics 
<https://github.com/apache/incubator-pinot/blob/ce2d9ee9dc73b2d7273a63a4eede774eb024ea8f/pinot-common/src/main/java/org/apache/pinot/common/metrics/ValidationMetrics.java>`_
 class.
+
+  * The time in hours since the last time an offline segment has been pushed 
to the controller.
+
+* Percent of replicas up - `PERCENT_OF_REPLICAS 
<https://github.com/apache/incubator-pinot/blob/master/pinot-common/src/main/java/org/apache/pinot/common/metrics/ControllerGauge.java>`_
+
+  * Percentage of complete online replicas in external view as compared to 
replicas in ideal state.
+
+* Table storage quota usage percent - `TABLE_STORAGE_QUOTA_UTILIZATION 
<https://github.com/apache/incubator-pinot/blob/master/pinot-common/src/main/java/org/apache/pinot/common/metrics/ControllerGauge.java>`_
+
+  * Shows how much of the table's storage quota is currently being used, 
metric will a percentage of a the entire quota.
+
 
diff --git a/docs/tableconfig_schema.rst b/docs/tableconfig_schema.rst
new file mode 100644
index 0000000..202564f
--- /dev/null
+++ b/docs/tableconfig_schema.rst
@@ -0,0 +1,209 @@
+..
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+..
+..   http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing,
+.. software distributed under the License is distributed on an
+.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+.. KIND, either express or implied.  See the License for the
+.. specific language governing permissions and limitations
+.. under the License.
+..
+
+Table Config
+============
+
+Sample table config and descriptions
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+A sample table config is shown below which has sub-sections collasped. The 
sub-sections will be described individually in following sections. Further 
links to feature specific documention will be included where available.
+
+``tableName`` - Should only contain alpha-numeric characters, hyphens ('-'), 
or underscores ('_'). Though using a double-underscore ('__') is not allowed 
and reserved for other features within Pinot.
+
+``tableType`` - Indicates the type of the table. There are some settings 
specific to each type. This will be clarified below as each sub-section is 
explained.
+
+  * Allowed values:
+
+    * ``OFFLINE`` - An offline table is used to host data which might be 
periodically uploaded - daily, weekly, etc. More information on `Offline Tables 
<architecture.html#ingesting-offline-data>`_
+    * ``REALTIME`` - A realtime table is used to consume data from incoming 
data streams and serve this data in a near-realtime manner. More information on 
`Realtime Tables <architecture.html#ingesting-realtime-data>`_
+
+.. code-block:: none
+
+    {
+      "tableName": "myPinotTable",
+      "tableType": "REALTIME"
+      "segmentsConfig": {...},
+      "tableIndexConfig": {...},
+      "tenants": {...},
+      "routing": {...},
+      "task": {...},
+      "metadata": {...}
+    }
+
+Some sections are required, otherwise the table config will be rejected by 
pinot-controller. The required sections are:
+
+* ``tableName``
+* ``tabletype``
+* ``"segmentsConfig": {...}``
+* ``"tableIndexConfig": {...}``
+* ``"tenants": {...}``
+* ``"metadata": {...}``
+
+Segments Config Section
+~~~~~~~~~~~~~~~~~~~~~~~
+
+The ``segmentsConfig`` section has information about configuring the following:
+
+* Segment Retention - with the ``retentionTimeUnit`` and 
``retentionTimeValue`` options. Retention is only applicable to tables of type 
``APPEND``.
+
+  * Allowed values:
+
+    * ``retentionTimeUnit`` - ``DAYS``
+    * ``retentionTimeValue`` - Positive integers
+
+* ``segmentPushFrequency`` - to indicate how frequently segments are uploaded.
+
+  * Allowed values - ``daily``, ``hourly``
+
+* ``segmentPushType`` - Indicates the type of push to the table.
+
+  * Allowed values:
+
+    * ``APPEND`` means new data will be pushed and appended to the current 
data in the table, all realtime tables *must* be explicity set to ``APPEND``.
+    * ``REFRESH`` will refresh the entire dataset contained within the table. 
Segment retention is ignored when set to ``REFRESH``.
+
+* ``replication`` - Number of replicas of data in a table, used for offline 
tables only.
+
+  * Allowed values - Positive integers
+
+* ``replicasPerPartition`` - Number of of data in a table, used for realtime 
LLC tables only.
+
+  * Allowed values - Positive integers
+
+* Time column - using ``timeColumnName`` and ``timeType``, this must match 
what's configured in the preceeding schema
+
+  * Allowed values - String, this must match the ``timeFieldSpec`` section in 
the schema
+
+* Segment assignment strategy - Described more on the page `Customizing Pinot 
<customizations.html#segment-assignment-strategies>`_
+
+
+.. code-block:: none
+
+    "segmentsConfig": {
+      "retentionTimeUnit": "DAYS",
+      "retentionTimeValue": "5",
+      "segmentPushFrequency": "daily",
+      "segmentPushType": "APPEND",
+      "replication": "3",
+      "replicasPerPartition": "3",
+      "schemaName": "myPinotSchmea",
+      "timeColumnName": "daysSinceEpoch",
+      "timeType": "DAYS",
+      "segmentAssignmentStrategy": "BalanceNumSegmentAssignmentStrategy"
+    },
+
+Table Index Config Section
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The ``tableIndexConfig`` section has information about how to configure:
+
+* ``invertedIndexColumns`` - Indicates a list of real column names as 
specified in the schema to create inverted indexes for. More info on indexes 
can be found on the `Index Techniques <index_techniques.html>`_ page.
+
+  * Allowed values - String; string must match the column name in the 
corresponding schema
+
+* ``noDictionaryColumns`` - Indicates a list of real column names as specified 
in the schema. Column names present will **not** have a dictionary created. 
More info on indexes can be found on the `Index Techniques 
<index_techniques.html>`_ page.
+
+  * Allowed values - String; string must match the column name in the 
corresponding schema
+
+* ``sortedColumn`` - Indicates a list of real column names as specified in the 
schema. Data should be sorted based on the column names provided. This field 
needs to be set only for realtime tables. For offline, if the data at source is 
sorted, we will create a sorted index automatically. More info on indexes can 
be found on the `Index Techniques <index_techniques.html>`_ page.
+
+  * Allowed values - String; string must match the column name in the 
corresponding schema
+
+* ``aggregateMetrics`` - Switch for the aggregate metrics feature. This 
feature will aggregate realtime stream data as it is consumed, where 
applicable, in order to reduce segment sizes. We sum the metric column values 
of all rows that have the same value for dimension columns and create one row 
in a realtime segment for all such rows. This feature is only available on 
REALTIME tables.
+
+  * Allowed values - ``true`` to enable, ``false`` to disable.
+
+.. todo::
+
+  Create a separate section to describe this feature and design, then link to 
it from this config description
+
+* ``segmentPartitionConfig`` - Cofigures the Data Partitioning Strategy. 
Further documentation on this feather available in the `Data Partitioning 
Strategies <customizations.html#data-partitioning-strategies>`_ section.
+* ``loadMode`` - indicates how data will be loaded on pinot-server. either 
``"MMAP"`` or ``"HEAP"`` can be configured.
+
+  * Allowed values:
+
+    * ``MMAP`` - Configures pinot-server to load data segments to off-heap 
memory.
+    * ``HEAP`` - Configures pinot-server to load data directly into direct 
memory.
+
+* ``streamConfigs`` - This section is where the bulk of the settings specific 
to only REALTIME tables are found. These options are explained in detail in the 
`Pluggable Streams <pluggable_streams.html#pluggable-streams>`_ page.
+
+.. code-block:: none
+
+    "tableIndexConfig": {
+      "invertedIndexColumns": [],
+      "noDictionaryColumns" : [],
+      "sortedColumn": [
+        "nameOfSortedColumn"
+      ],
+      "noDictionaryColumns": [
+        "nameOfNoDictionaryColumn"
+      ],
+      "aggregateMetrics": "true",
+      "segmentPartitionConfig": {
+        "columnPartitionMap": {
+          "contentId": {
+            "functionName": "murmur",
+            "numPartitions": 32
+          }
+        }
+      },
+      "loadMode": "MMAP",
+      "lazyLoad": "false",
+      "segmentFormatVersion": "v3",
+      "streamConfigs": {}
+    },
+
+Tenants Section
+~~~~~~~~~~~~~~~
+
+The ``tenants`` section has two main config fields in it. These fields are 
used to configure which tenants are used within Helix.
+
+.. code-block:: none
+
+    "tenants": {
+      "broker": "brokerTenant",
+      "server": "serverTenant"
+    },
+
+Routing Section
+~~~~~~~~~~~~~~~
+
+The ``routing`` section contains configurations on how which 
routingTableBuilder will be used and to pass options specific to that builder. 
There is more information in the `Routing Strategies 
<customizations.html#routing-strategies>`_ section.
+
+.. code-block:: none
+
+    "routing": {
+      "routingTableBuilderName": "PartitionAwareRealtime",
+      "routingTableBuilderOptions": {}
+    },
+
+Metadata Section
+~~~~~~~~~~~~~~~~
+
+The ``metadata`` section is used for passing special key-value pairs into 
Pinot which will be stored with the table config inside of Pinot. There's more 
info in the `Custom Configs <customizations.html#custom-configs>`_ section.
+
+.. code-block:: none
+
+    "metadata": {
+      "customConfigs": {
+        "specialConfig": "testValue",
+        "anotherSpecialConfig": "value"
+      }
+    }


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to