Liu created FLINK-39203:
---------------------------

             Summary: Support Runtime Data Sampling for Operators with WebUI 
Visualization
                 Key: FLINK-39203
                 URL: https://issues.apache.org/jira/browse/FLINK-39203
             Project: Flink
          Issue Type: Improvement
            Reporter: Liu
         Attachments: image-2026-03-04-10-23-59-367.png

h1. Motivation

Debugging data issues in streaming jobs is one of the long-standing pain points 
in Flink. Currently, when users want to inspect intermediate data flowing 
through operators at runtime, they have to:
 # Add a print() sink or connect an external system (e.g., Kafka), modify the 
topology, and resubmit the job.
 # Use Table.execute().collect() / executeAndCollect(), which only works for 
final sink output and requires code changes.
 # Inspect external storage systems after data is written out.

All of these approaches require either job modification and restart, or 
external infrastructure, making it extremely inconvenient for quick data 
validation and troubleshooting.
Meanwhile, similar capabilities are widely available in competing systems:
 # Apache Spark UI supports Stage-level data preview.
 # Commercial streaming platforms provide built-in data preview.
 # Apache Beam supports pipeline visual debugging.

*Proposal:* Introduce a native runtime data sampling capability in Flink that 
allows users to dynamically sample records at operator output without 
restarting the job, and visualize the sampled data directly in the Flink WebUI.
h1. Goals
 # Support on-demand data sampling at the output side of any operator in a 
running job.
 # Provide a REST API to trigger and retrieve sampling results.
 # Visualize sampled records in the Flink WebUI (using toString() for 
non-binary data).
 # Ensure the feature has zero overhead when not in use (disabled by default).
 # Provide safety mechanisms: configurable sample size limits, TTL-based auto 
cleanup, and memory usage caps.

h1. High-Level Design

The implementation follows the proven architecture pattern of the existing 
FlameGraph feature (FLINK-13550), which also provides on-demand dynamic 
sampling via REST API → JM coordination → TM-side execution → WebUI rendering.

!image-2026-03-04-10-23-59-367.png|width=525,height=441!

A detailed design document (FLIP) will follow this JIRA for community 
discussion.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to