Hi all, Thanks to Kostas for reminding me that as early as March 2017, the community had a thread called "Future of Queryable State Feature". [1]
It has already discussed the queryable state and how to make the window state queryable. I still think it can offer many advantages, especially for Ad-Hoc. Best, Vino [1]: http://mail-archives.apache.org/mod_mbox/flink-dev/201703.mbox/%3C362C780C-9672-4DBD-B3F1-4EE7D1DB4CA6%40apache.org%3E mayozhang <18124766...@163.com> 于2019年7月4日周四 下午10:21写道: > It’s a good idea to get the process information of large ongoing window. > +1 from my side. > > > 在 2019年7月4日,11:41,vino yang <yanghua1...@gmail.com> 写道: > > > > Hi folks, > > > > Currently, the queryable state is not widely used in production. IMO, > there > > are two key reasons caused this result. 1) the client of the queryable > > state is hard to use. Because it requires users to know the address of > > TaskManager and the port of the proxy. Actually, most business users who > do > > not have good knowledge about the Flink's inner and runtime in > production. > > 2) The benefit of this feature has not been excavated. In Flink > DataStream > > API, State is the first level citizen, it’s Flink key advantage compared > > with other compute engines. Because the queryable state is the most > > effective way to pry the latest computing progress. > > > > Three months ago, I started a discussion about improving the queryable > > state and introducing a proxy component.[1] It brings a lot of attention > > and discussion. Recently, I have submitted a design document about the > > proposal.[2] These efforts try to process the first problem. > > > > About the second question, the most essential solution is that we should > > really make the queryable state work. The window operator is one of the > > most valuable and most frequently used operators of all Flink operators. > > And it also uses keyed state which is queryable. So we propose to let the > > state of the window operator be queried. This is not only for increasing > > the value of the queryable state but also for the real business needs. > > > > IMO, allowing window state to be queried will provide great value. In > many > > scenarios, we often use large windows for aggregate calculations. A very > > common example is a day-level window that counts the PV of a day. But > > usually, the user is not only satisfied to wait until the end of the > window > > to get the result. They want to get "intermediate results" at a smaller > > time granularity to analyze trends. Because Flink does not provide > periodic > > triggers for fixed windows. We have extended this and implemented an > > "incremental window". It can trigger a fixed window with a smaller > interval > > period and feedback intermediate results. However, we believe that this > > approach is still not flexible enough. We should let the user query the > > current calculation result of the window through the API at any time. > > > > However, I know that if we want to implement it, we still have some > details > > that need to be discussed, such as how to let users know the state > > descriptors in the window, namespace and so on. > > > > This discussion thread is mainly to listen to the community's opinion on > > this proposal. > > > > Any feedback and ideas are welcome and appreciated. > > > > Best, > > Vino > > > > [1]: > > > http://mail-archives.apache.org/mod_mbox/flink-dev/201907.mbox/%3ctencent_35a56d6858408be2e2064...@qq.com%3E > > [2]: > > > https://docs.google.com/document/d/181qYVIiHQGrc3hCj3QBn1iEHF4bUztdw4XO8VSaf_uI/edit?usp=sharing > > >