Thanks for the flip. It is useful for users. I have only one question: JM 
Memory Pressure Under High-Concurrency Sampling — Could It Cause OOM in 
Large-Scale Jobs?

> 2026年3月24日 12:24,Jiangang Liu <[email protected]> 写道:
> 
> Hi everyone,
> 
> I would like to start a discussion on FLIP-570: Support Runtime Data
> Sampling for Operators with WebUI Visualization [1].
> 
> Inspecting intermediate data in a running Flink job is a common need
> across development, data exploration, and troubleshooting. Today, the
> only options are modifying the job (print() sink, log statements — all
> require a restart) or deploying external infrastructure (extra Kafka
> topics, debug sinks). Both are slow and disruptive for what is
> essentially a "what does the data look like here?" question.
> 
> FLIP-570 proposes native runtime data sampling, following the same
> proven architecture pattern as FlameGraph (FLINK-13550). The key ideas:
> 
>   1. On-demand, round-scoped sampling at the output of any job vertex,
>   triggered via REST API without job restart or topology modification.
>   2. A new "Data Sample" tab in the WebUI with auto-polling, subtask
>   selector, and status-driven display.
>   3. Minimal overhead: zero when disabled; ~1.6% for the lightest ETL
>   workloads when enabled-idle; <0.5% for typical production workloads.
>   4. Safety by default: disabled by default, with rate limiting, time
>   budget, buffer caps, and round-scoped auto-disable.
> 
> For more details, please refer to the FLIP [1].
> 
> Looking forward to your feedback and thoughts!
> 
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-570%3A+Support+Runtime+Data+Sampling+for+Operators+with+WebUI+Visualization
> 
> Best regards,
> Jiangang Liu

Reply via email to