Hi everyone, I would like to start a discussion on FLIP-570: Support Runtime Data Sampling for Operators with WebUI Visualization [1].
Inspecting intermediate data in a running Flink job is a common need across development, data exploration, and troubleshooting. Today, the only options are modifying the job (print() sink, log statements — all require a restart) or deploying external infrastructure (extra Kafka topics, debug sinks). Both are slow and disruptive for what is essentially a "what does the data look like here?" question. FLIP-570 proposes native runtime data sampling, following the same proven architecture pattern as FlameGraph (FLINK-13550). The key ideas: 1. On-demand, round-scoped sampling at the output of any job vertex, triggered via REST API without job restart or topology modification. 2. A new "Data Sample" tab in the WebUI with auto-polling, subtask selector, and status-driven display. 3. Minimal overhead: zero when disabled; ~1.6% for the lightest ETL workloads when enabled-idle; <0.5% for typical production workloads. 4. Safety by default: disabled by default, with rate limiting, time budget, buffer caps, and round-scoped auto-disable. For more details, please refer to the FLIP [1]. Looking forward to your feedback and thoughts! [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-570%3A+Support+Runtime+Data+Sampling+for+Operators+with+WebUI+Visualization Best regards, Jiangang Liu
