Thanks Xintong Song for the detailed supplement. Since flink is
long-running, it is similar to many services. So interacting with it or
controlling it is a common desire. This was our initial thought when
implementing the feature. In our inner flink, many configs used in yaml can
be adjusted by dynamic to avoid restarting the job, for examples as follow:

   1. Limit the input qps.
   2. Degrade the job by sampling and so on.
   3. Reset kafka offset in certain cases.
   4. Stop checkpoint in certain cases.
   5. Control the history consuming.
   6. Change log level for debug.


After deep discussion, we realize that a common control flow
will benefit both users and developers. Dynamic config is just one of the
use cases. For the concrete design and implementation, it relates with many
components, like jobmaster, network channel, operators and so on, which
needs deeper consideration and design.

Xintong Song [via Apache Flink User Mailing List archive.] <
ml+s2336050n44245...@n4.nabble.com> 于2021年6月7日周一 下午2:52写道:

> Thanks Jiangang for bringing this up, and Steven & Peter for the feedback.
>
> I was part of the preliminary offline discussions before this proposal
> went public. So maybe I can help clarify things a bit.
>
> In short, despite the phrase "control mode" might be a bit misleading,
> what we truly want to do from my side is to make the concept of "control
> flow" explicit and expose it to users.
>
> ## Background
> Jiangang & his colleagues at Kuaishou maintain an internal version of
> Flink. One of their custom features is allowing dynamically changing
> operator behaviors via the REST APIs. He's willing to contribute this
> feature to the community, and came to Yun Gao and me for suggestions. After
> discussion, we feel that the underlying question to be answered is how do
> we model the control flow in Flink. Dynamically controlling jobs via REST
> API can be one of the features built on top of the control flow, and there
> could be others.
>
> ## Control flow
> Control flow refers to the communication channels for sending
> events/signals to/between tasks/operators, that changes Flink's behavior in
> a way that may or may not affect the computation logic. Typical control
> events/signals Flink currently has are watermarks and checkpoint barriers.
>
> In general, for modeling control flow, the following questions should be
> considered.
> 1. Who (which component) is responsible for generating the control
> messages?
> 2. Who (which component) is responsible for reacting to the messages.
> 3. How do the messages propagate?
> 4. When it comes to affecting the computation logics, how should the
> control flow work together with the exact-once consistency.
>
> 1) & 2) may vary depending on the use cases, while 3) & 4) probably share
> many things in common. A unified control flow model would help deduplicate
> the common logics, allowing us to focus on the use case specific parts.
>
> E.g.,
> - Watermarks: generated by source operators, handled by window operators.
> - Checkpoint barrier: generated by the checkpoint coordinator, handled by
> all tasks
> - Dynamic controlling: generated by JobMaster (in reaction to the REST
> command), handled by specific operators/UDFs
> - Operator defined events: The following features are still in planning,
> but may potentially benefit from the control flow model. (Please correct me
> if I'm wrong, @Yun, @Jark)
>   * Iteration: When a certain condition is met, we might want to signal
> downstream operators with an event
>   * Mini-batch assembling: Flink currently uses special watermarks for
> indicating the end of each mini-batch, which makes it tricky to deal with
> event time related computations.
>   * Hive dimension table join: For periodically reloaded hive tables, it
> would be helpful to have specific events signaling that a reloading is
> finished.
>   * Bootstrap dimension table join: This is similar to the previous one.
> In cases where we want to fully load the dimension table before starting
> joining the mainstream, it would be helpful to have an event signaling the
> finishing of the bootstrap.
>
> ## Dynamic REST controlling
> Back to the specific feature that Jiangang proposed, I personally think
> it's quite convenient. Currently, to dynamically change the behavior of an
> operator, we need to set up a separate source for the control events and
> leverage broadcast state. Being able to send the events via REST APIs
> definitely improves the usability.
>
> Leveraging dynamic configuration frameworks is for sure one possible
> approach. The reason we are in favor of introducing the control flow is
> that:
> - It benefits not only this specific dynamic controlling feature, but
> potentially other future features as well.
> - AFAICS, it's non-trivial to make a 3rd-party dynamic configuration
> framework work together with Flink's consistency mechanism.
>
> Thank you~
>
> Xintong Song
>
>
>
> On Mon, Jun 7, 2021 at 11:05 AM 刘建刚 <[hidden email]
> <http:///user/SendEmail.jtp?type=node&node=44245&i=0>> wrote:
>
>> Thank you for the reply. I have checked the post you mentioned. The
>> dynamic config may be useful sometimes. But it is hard to keep data
>> consistent in flink, for example, what if the dynamic config will take
>> effect when failover. Since dynamic config is a desire for users, maybe
>> flink can support it in some way.
>>
>> For the control mode, dynamic config is just one of the control modes. In
>> the google doc, I have list some other cases. For example, control events
>> are generated in operators or external services. Besides user's dynamic
>> config, flink system can support some common dynamic configuration, like
>> qps limit, checkpoint control and so on.
>>
>> It needs good design to handle the control mode structure. Based on that,
>> other control features can be added easily later, like changing log level
>> when job is running. In the end, flink will not just process data, but also
>> interact with users to receive control events like a service.
>>
>> Steven Wu <[hidden email]
>> <http:///user/SendEmail.jtp?type=node&node=44245&i=1>> 于2021年6月4日周五
>> 下午11:11写道:
>>
>>> I am not sure if we should solve this problem in Flink. This is more
>>> like a dynamic config problem that probably should be solved by some
>>> configuration framework. Here is one post from google search:
>>> https://medium.com/twodigits/dynamic-app-configuration-inject-configuration-at-run-time-using-spring-boot-and-docker-ffb42631852a
>>>
>>> On Fri, Jun 4, 2021 at 7:09 AM 刘建刚 <[hidden email]
>>> <http:///user/SendEmail.jtp?type=node&node=44245&i=2>> wrote:
>>>
>>>> Hi everyone,
>>>>
>>>>       Flink jobs are always long-running. When the job is running,
>>>> users may want to control the job but not stop it. The control reasons can
>>>> be different as following:
>>>>
>>>>    1.
>>>>
>>>>    Change data processing’ logic, such as filter condition.
>>>>    2.
>>>>
>>>>    Send trigger events to make the progress forward.
>>>>    3.
>>>>
>>>>    Define some tools to degrade the job, such as limit input qps,
>>>>    sampling data.
>>>>    4.
>>>>
>>>>    Change log level to debug current problem.
>>>>
>>>>       The common way to do this is to stop the job, do modifications
>>>> and start the job. It may take a long time to recover. In some situations,
>>>> stopping jobs is intolerable, for example, the job is related to money or
>>>> important activities.So we need some technologies to control the
>>>> running job without stopping the job.
>>>>
>>>>
>>>> We propose to add control mode for flink. A control mode based on the
>>>> restful interface is first introduced. It works by these steps:
>>>>
>>>>
>>>>    1. The user can predefine some logic which supports config control,
>>>>    such as filter condition.
>>>>    2. Run the job.
>>>>    3. If the user wants to change the job's running logic, just send a
>>>>    restful request with the responding config.
>>>>
>>>> Other control modes will also be considered in the future. More
>>>> introduction can refer to the doc
>>>> https://docs.google.com/document/d/1WSU3Tw-pSOcblm3vhKFYApzVkb-UQ3kxso8c8jEzIuA/edit?usp=sharing
>>>> . If the community likes the proposal, more discussion is needed and a more
>>>> detailed design will be given later. Any suggestions and ideas are welcome.
>>>>
>>>>
>
> ------------------------------
> If you reply to this email, your message will be added to the discussion
> below:
>
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Add-control-mode-for-flink-tp44203p44245.html
> To start a new topic under Apache Flink User Mailing List archive., email
> ml+s2336050n1...@n4.nabble.com
> To unsubscribe from Apache Flink User Mailing List archive., click here
> <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=1&code=bGl1amlhbmdhbmdwZW5nQGdtYWlsLmNvbXwxfC0xMTYwNzM3MjI=>
> .
> NAML
> <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>

Reply via email to