Re: [PR] Add blogpost for new Prometheus connector [flink-web]

via GitHub Wed, 27 Nov 2024 07:17:06 -0800


nicusX commented on code in PR #766:
URL: https://github.com/apache/flink-web/pull/766#discussion_r1860845426



##########
docs/content/posts/2024-11-26-introducing-new-prometheus-connector.md:
##########
@@ -0,0 +1,202 @@
+---
+title:  "Introducing the new Prometheus connector"
+date: "2024-11-26T00:00:00.000Z"
+authors:
+- nicusX:
+  name: "Lorenzo Nicora"
+---
+
+
+We are excited to announce a new sink connector that enables writing data to 
Prometheus 
([FLIP-312](https://cwiki.apache.org/confluence/display/FLINK/FLIP-312:+Prometheus+Sink+Connector)).
 This articles introduces the main features of the connector, and the reasoning 
behind design decisions.
+
+This connector allows writing data to Prometheus using the 
[Remote-Write](https://prometheus.io/docs/specs/remote_write_spec/) push 
interface, which lets you write time-series data to Prometheus at scale.
+
+## Motivations for a Prometheus connector
+
+Prometheus is an efficient time-series database optimized for building 
real-time dashboards and alerts, typically in combination with Grafana or other 
visualization tools.
+
+Prometheus is commonly used to monitor compute resources, IT infrastructure, 
Kubernetes clusters, applications, and cloud resources. It can also be used to 
observe your Flink cluster and Flink jobs. Flink existing [Metric 
Reporters](https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/metric_reporters/)
 has this purpose. 
+
+So, why do we need a connector?
+
+Prometheus can serve as a general-purpose observability time-series database, 
beyond traditional infrastructure monitoring. For example, it can be used to 
monitor IoT devices, sensors, connected cars, media streaming devices, and any 
resource that streams events or measurements continuously.
+
+Observability data from these use cases differs from metrics generated by 
compute resources. They present additional challenges:
+* **Out-of-order events**: Devices may be connected via mobile networks or 
even Bluetooth. Events from different devices may follow different paths and 
arrive at very different times. A **stateful, event-time logic** can be used to 
reorder them.
+* **High frequency** and **high cardinality**: You can have a sheer number of 
devices, each emitting signals multiple times per second. **Aggregating over 
time** and **over dimensions** can reduce frequency and cardinality and make 
the volume of data more efficiently analysable.
+* **Lack of contextual information**: Raw events sent by the devices often 
lack of contextual information for a meaningful analysis. **Enrichment** of raw 
events, looking up some reference dataset, can be used to add dimensions useful 
for the analysis.
+* **Noise**: sensor measurement may contain noise. For example when a GPS 
tracker lose connection and reports spurious positions. These obvious outliers 
can be **filtered** out to simplify visualization and analysis.
+
+Flink can be used as a pre-processor to address all the above. 
+
+You can implement a sink from scratch or use AsyncIO to call the Prometheus 
Remote-Write endpoint. However, there are not-trivial details to implement an 
efficient Remote-Write client:
+* There is no high-level client for Prometheus Remote-Write. You would need to 
build on top of a low-level HTTP client.
+* Remote-Write can be inefficient unless write requests are batched and 
parallelized.
+* Error handling can be complex, and specifications demand strict behaviors 
(see [Strict Specifications, Lenient 
Implementations](#strict-specifications-lenient-implementations)).
+
+The new Prometheus connector manages all of this for you.
+
+## Key features
+
+The version `1.0.0` of the Prometheus connector has the following features:
+
+* DataStream API, Java Sink, based on AsyncSinkBase.
+* Configurable write batching.
+* Order is retained on retries.
+* At-most-once guarantees (we’ll explain the reason for this later).

Review Comment:
   Prometheus is at-most-once by design: fast ingestion and data freshness over 
consistency. If you try to make a Prometheus sink at-least-once, Prometheus 
will spit in your face :D 
   If you make the connector fail when Prometheus rejects your data, you are 
doomed into an endless loop of restart-from-checkpoint



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] Add blogpost for new Prometheus connector [flink-web]

Reply via email to