jgutmann commented on a change in pull request #3975: ReadTheDocs documentation for Table Configs, Monitoring, and Deployment URL: https://github.com/apache/incubator-pinot/pull/3975#discussion_r268308656
########## File path: docs/in_production.rst ########## @@ -64,4 +67,61 @@ Configuring realtime data ingestion Monitoring Pinot ~~~~~~~~~~~~~~~~ +Pinot exposes several metrics to monitor the service and ensure that pinot users are not experiencing issues. In this section we discuss some of the key metrics that are useful to monitor. A full list of metrics is available in the `Metrics <customizations.html#metrics>`_ section. + +Pinot Server +^^^^^^^^^^^^ + +* Missing Segments - `NUM_MISSING_SEGMENTS <https://github.com/apache/incubator-pinot/blob/master/pinot-common/src/main/java/org/apache/pinot/common/metrics/ServerMeter.java>`_ + + * Number of missing segments that the broker queried for (expected to be on the server) but the server didn't have. This can be due to retention or stale routing table. + +* Query latency - `TOTAL_QUERY_TIME <https://github.com/apache/incubator-pinot/blob/ce2d9ee9dc73b2d7273a63a4eede774eb024ea8f/pinot-common/src/main/java/org/apache/pinot/common/metrics/ServerQueryPhase.java>`_ + + * The number of exception which might have occurred during query execution + +* Query Execution Exceptions - `QUERY_EXECUTION_EXCEPTIONS <https://github.com/apache/incubator-pinot/blob/master/pinot-common/src/main/java/org/apache/pinot/common/metrics/ServerMeter.java>`_ + + * The number of exception which might have occurred during query execution + +* Realtime Consumption Status - `LLC_PARTITION_CONSUMING <https://github.com/apache/incubator-pinot/blob/master/pinot-common/src/main/java/org/apache/pinot/common/metrics/ServerGauge.java>`_ + + * This gives a binary value based on whether low-level consumption is healthy (1) or unhealthy (0). It's important to ensure at least a single replica of each partition is consuming + +* Realtime Highest Offset Consumed - `HIGHEST_STREAM_OFFSET_CONSUMED <https://github.com/apache/incubator-pinot/blob/master/pinot-common/src/main/java/org/apache/pinot/common/metrics/ServerGauge.java>`_ + + * The highest offset which has been consumed so far. + +Pinot Broker +^^^^^^^^^^^^ + +* Incoming QPS (per broker) - `QUERIES <https://github.com/apache/incubator-pinot/blob/master/pinot-common/src/main/java/org/apache/pinot/common/metrics/BrokerMeter.java>`_ + + * The rate which an individual broker is receiving queries. Units are in QPS. + +* Dropped Requests - `REQUEST_DROPPED_DUE_TO_SEND_ERROR <https://github.com/apache/incubator-pinot/blob/master/pinot-common/src/main/java/org/apache/pinot/common/metrics/BrokerMeter.java>`_, `REQUEST_DROPPED_DUE_TO_CONNECTION_ERROR <https://github.com/apache/incubator-pinot/blob/master/pinot-common/src/main/java/org/apache/pinot/common/metrics/BrokerMeter.java>`_, `REQUEST_DROPPED_DUE_TO_ACCESS_ERROR <https://github.com/apache/incubator-pinot/blob/master/pinot-common/src/main/java/org/apache/pinot/common/metrics/BrokerMeter.java>`_ + + * These multiple metrics will indicate if a query is dropped, ie the processing of that query has been forfeited for some reason. + +* Partial Responses - `BROKER_RESPONSES_WITH_PARTIAL_SERVERS_RESPONDED <https://github.com/apache/incubator-pinot/blob/master/pinot-common/src/main/java/org/apache/pinot/common/metrics/BrokerMeter.java>`_ + + * Indicates a count of partial responses. A partial response is when at least 1 of the requested servers fails to respond to the query. + +* Table QPS quota exceeded - `QUERY_QUOTA_EXCEEDED <https://github.com/apache/incubator-pinot/blob/master/pinot-common/src/main/java/org/apache/pinot/common/metrics/BrokerMeter.java>`_ + + * Binary metric which will indicate when the configured QPS quota for a table is exceeded (1) or if there is capacity remaining (0). + +* Table QPS quota usage percent - `QUERY_QUOTA_CAPACITY_UTILIZATION_RATE <https://github.com/apache/incubator-pinot/blob/master/pinot-common/src/main/java/org/apache/pinot/common/metrics/BrokerGauge.java>`_ + + * Percentage of the configured QPS quota being utilized. + +Pinot Controller Review comment: I am not sure how these controller metrics are created. I need someone to point me to that place in the code so that I can link to it from here. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
