Gaël Renoux created FLINK-20390:
-----------------------------------
Summary: Programmatic access to the back-pressure
Key: FLINK-20390
URL: https://issues.apache.org/jira/browse/FLINK-20390
Project: Flink
Issue Type: New Feature
Components: API / Core
Reporter: Gaël Renoux
It would be useful to access the back-pressure monitoring from within functions.
Here is our use case: we have a real-time Flink job, which takes decisions
based on input data. Sometimes, we have traffic spikes on the input and the
decisions process cannot processe records fast enough. Back-pressure starts
mounting, all the way back to the Source. What we want to do is to start
dropping records in this case, because it's better to make decisions based on
just a sample of the data rather than accumulate too much lag.
Right now, the only way is to have a filter with a hard-limit on the number of
records per-interval-of-time, and to drop records once we are over this limit.
However, this requires a lot of tuning to find out what the correct limit is,
especially since it may depend on the nature of the inputs (some decisions take
longer to make than others). It's also heavily dependent on the buffers: the
limit needs to be low enough that all records that pass the limit can fit in
the downstream buffers, or the back-pressure will will go back past the
filtering task and we're back to square one. Finally, it's not very resilient
to change: whenever we scale the infrastructure up, we need to redo the whole
tuning thing.
With programmatic access to the back-pressure, we could simply start dropping
records based on its current level. No tuning, and adjusted to the actual
issue. For performance, I assume it would be better if it reused the existing
back-pressure monitoring mechanism, rather than looking directly into the
buffer. A sampling of the back-pressure should be enough, and if more precision
is needed you can simply change the existing back-pressure configuration.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)