talatuyarer commented on code in PR #29794:
URL: https://github.com/apache/beam/pull/29794#discussion_r1432969802


##########
website/www/site/content/en/blog/apache-beam-flink-and-kubernetes-part2.md:
##########
@@ -0,0 +1,161 @@
+---
+title:  "Build a scalable, self-managed streaming infrastructure with Beam and 
Flink: Tackling Autoscaling Challenges - Part 2"
+date:   2023-12-18 09:00:00 -0400
+categories:
+  - blog
+authors:
+  - talat
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+Welcome back to Part 2 of our in-depth series on building and managing a 
service for Apache Beam Flink on Kubernetes. In this segment, we're taking a 
closer look at the hurdles we encountered while implementing autoscaling. 
+
+<!--more-->
+
+# Build a scalable, self-managed streaming infrastructure with Flink: Tackling 
Autoscaling Challenges - Part 2
+
+## Introduction
+
+Welcome back to Part 2 of our in-depth series on building and managing a 
service for Apache Beam Flink on Kubernetes. In this segment, we're taking a 
closer look at the hurdles we encountered while implementing autoscaling. These 
challenges weren't just roadblocks; they were opportunities for us to innovate 
and enhance our system. Let’s break down these issues, understand their 
context, and explore the solutions we developed.
+
+## Understanding Apache Beam Backlog Metrics in the Flink Runner Environment
+
+**The Challenge:** In our current setup, we are using Apache Flink for 
processing data streams. However, we've encountered a puzzling issue: our Flink 
job isn't showing the backlog metrics from Apache Beam. These metrics are 
critical for understanding the state and performance of our data pipelines.
+
+**What We Found:** Interestingly, we noticed that the metrics are actually 
being generated in KafkaIO, which is a part of our data pipeline handling Kafka 
streams. But when we try to monitor these metrics through the Apache Flink 
Metric system, they're nowhere to be found. This led us to suspect there might 
be an issue with the integration (or 'wiring') between Apache Beam and Apache 
Flink.
+
+**Digging Deeper:** On closer inspection, we found that the metrics should be 
emitted during the 'Checkpointing' phase of the data stream processing. This is 
a crucial step where the system takes a snapshot of the stream's state, and 
it's typically where metrics are generated for 'Unbounded' sources (sources 
that continuously stream data, like Kafka).
+
+**A Potential Solution:** We believe the root of the problem lies in how the 
metric context is set during this checkpointing phase. There seems to be a 
disconnect that's preventing the Beam metrics from being properly captured in 
the Flink Metric system. We've proposed a fix for this issue, which you can 
review and contribute to on our GitHub pull request here: [Apache Beam PR 
#29793](https://github.com/apache/beam/pull/29793).
+
+
+<img class="center-block"
+src="/images/blog/apache-beam-flink-and-kubernetes-part2/flink-backlog-metrics.png"
+alt="Apache Flink Beam Backlog Metrics">
+
+
+## Overcoming Challenges in Checkpoint Size Reduction for Autoscaling Beam Jobs
+

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to