nicusX commented on code in PR #3: URL: https://github.com/apache/flink-connector-prometheus/pull/3#discussion_r1829138048
########## docs/content/docs/connectors/datastream/prometheus.md: ########## @@ -0,0 +1,473 @@ +--- +title: Prometheus +weight: 5 +type: docs +aliases: + - /dev/connectors/prometheus.html + - /apis/streaming/connectors/prometheus.html +--- + +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# Prometheus Sink + +This sink connector can be used to write **data** to Prometheus-compatible storage, using the [Remote Write](https://prometheus.io/docs/specs/remote_write_spec/) Prometheus interface. + +The Prometheus-compatible backend must support [Remote Write 1.0](https://prometheus.io/docs/specs/remote_write_spec/) standard API, and the Remote Write endpoint must be enabled. + +{{< hint warn >}}This connector is not meant for sending internal Flink metrics to Prometheus. +To publish Flink metrics, for monitoring health and operations of the Flink cluster, you should use +[Metric Reporters](../../../deployment/metric_reporters/).{{< /hint >}} + +To use the connector, add the following Maven dependency to your project: + +{{< connector_artifact flink-connector-prometheus prometheus >}} + +## Usage + +The Prometheus sink provides a builder class to build a `PrometheusSink` instance. The code snippets below shows +how to build a `PrometheusSink` with a basic configuration, and an optional [request signer](#request-signer). + +```java +PrometheusSink sink = PrometheusSink.builder() + .setPrometheusRemoteWriteUrl(prometheusRemoteWriteUrl) + .setRequestSigner(new AmazonManagedPrometheusWriteRequestSigner(prometheusRemoteWriteUrl, prometheusRegion)) // Optional + .build(); +``` +The only **required** configuration is `prometheusRemoteWriteUrl`. All other configurations are optional. + +If your sink has parallelism > 1, you need to ensure the stream is keyed using the `PrometheusTimeSeriesLabelsAndMetricNameKeySelector` +key selector, so that all samples of the same time-series are in the same partition and order is not lost. +See [Sink parallelism and keyed streams](#sink-parallelism-and-keyed-streams) for more details. + + +### Input data objects + +The sink expects `PrometheusTimeSeries` records as input. +Your input data must be converted into `PrometheusTimeSeries`, using a map or flatMap operator, before the sending to the sink. + +`PrometheusTimeSeries` instances are immutable and cannot be reused. You can use the [builder](#populating-a-prometheustimeseries) +to create and populate instances. + +A `PrometheusTimeSeries` represents a single time-series record when sent to the Remote Write interface. Each time-series +record may contain multiple samples. + +{{< hint info >}} +In the context of Prometheus, the term "time-series" is overloaded. +It means both *a series of samples with a unique set of labels* (a time-series in the underlying time-series database), +and *a record sent to the Remote Write interface*. A `PrometheusTimeSeries` instance represents a record sent to the interface. + +The two concepts are related, because time-series "records" with the same sets of labels are sent to the same +"database time-series".{{< /hint >}} + +Each `PrometheusTimeSeries` record contains: + +- One **`metricName`**. A string that is translated into the value of the `__name__` label. +- Zero or more **`Label`** entries. Each label has a `key` and a `value`, both `String`. Labels represent additional dimensions of the samples. Duplicate Label keys are not allowed. +- One or more **`Sample`**. Each sample has a `value` (`double`) representing the measure, and a `timestamp` (`long`) representing the time of the measure, in milliseconds from the Epoch. Duplicate timestamps in the same record are not allowed. + +The following pseudocode represents the structure of a `PrometheusTimeSeries` record: + +``` +PrometheusTimeSeries + + --> (1) metricName <String> + + --> (0..*) Label + + name <String> + + value <String> + + --> 1..* Sample + + timestamp <long> + + value <double> +``` Review Comment: Really? This is supposed to be pseudo-code and uses the standard cardinality notation commonly used in diagrams, like ER or UML. It is not a matter of being mandatory. Labels and Samples are lists. To convey the fact that one list may contain zero or more elements and the other must contain at least one element, it would become something like this: ``` + --> metricName <String> // mandatory + --> Label // list of zero or more elements + name <String> + value <String> + --> Sample // list of one or more elements + timestamp <long> + value <double> ``` IMO this is more confusing, but I am okay with that if looks clearer. In the original there was a parenthesis missing. Fixing it it would be this: ``` + --> (1) metricName <String> + --> (0..*) Label + name <String> + value <String> + --> (1..*) Sample + timestamp <long> + value <double> ``` alternatively this one ``` + --> (mandatory) metricName <String> + --> (list of zero or more) Label + name <String> + value <String> + --> (list of one or more) Sample + timestamp <long> + value <double> ``` or, using something that looks more an array notation ``` + --> metricName <String> // mandatory + --> Label [0..*] + name <String> + value <String> + --> Sample [1..*] + timestamp <long> + value <double> ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
