Liu created FLINK-39203:
---------------------------
Summary: Support Runtime Data Sampling for Operators with WebUI
Visualization
Key: FLINK-39203
URL: https://issues.apache.org/jira/browse/FLINK-39203
Project: Flink
Issue Type: Improvement
Reporter: Liu
Attachments: image-2026-03-04-10-23-59-367.png
h1. Motivation
Debugging data issues in streaming jobs is one of the long-standing pain points
in Flink. Currently, when users want to inspect intermediate data flowing
through operators at runtime, they have to:
# Add a print() sink or connect an external system (e.g., Kafka), modify the
topology, and resubmit the job.
# Use Table.execute().collect() / executeAndCollect(), which only works for
final sink output and requires code changes.
# Inspect external storage systems after data is written out.
All of these approaches require either job modification and restart, or
external infrastructure, making it extremely inconvenient for quick data
validation and troubleshooting.
Meanwhile, similar capabilities are widely available in competing systems:
# Apache Spark UI supports Stage-level data preview.
# Commercial streaming platforms provide built-in data preview.
# Apache Beam supports pipeline visual debugging.
*Proposal:* Introduce a native runtime data sampling capability in Flink that
allows users to dynamically sample records at operator output without
restarting the job, and visualize the sampled data directly in the Flink WebUI.
h1. Goals
# Support on-demand data sampling at the output side of any operator in a
running job.
# Provide a REST API to trigger and retrieve sampling results.
# Visualize sampled records in the Flink WebUI (using toString() for
non-binary data).
# Ensure the feature has zero overhead when not in use (disabled by default).
# Provide safety mechanisms: configurable sample size limits, TTL-based auto
cleanup, and memory usage caps.
h1. High-Level Design
The implementation follows the proven architecture pattern of the existing
FlameGraph feature (FLINK-13550), which also provides on-demand dynamic
sampling via REST API → JM coordination → TM-side execution → WebUI rendering.
!image-2026-03-04-10-23-59-367.png|width=525,height=441!
A detailed design document (FLIP) will follow this JIRA for community
discussion.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)