Re: [External] [DISCUSS] FLIP-292: Support configuring state TTL at operator level for Table API & SQL programs

Timo Walther Mon, 03 Apr 2023 07:01:59 -0700

Hi Jane,

thanks for proposing this FLIP. More state insights and fine-grainedstate TTL are a frequently requested feature for Flink SQL. Eventually,we need to address this.

I agree with the previous responses that doing this with a hint mightcause more confusion than it actually helps. We should use hints only ifthey can be placed close to an operation (e.g. JOIN or table). And onlywhere a global flag for the entire query is not sufficient using SET.

In general, I support the current direction of the FLIP and continuingthe vision of FLIP-190. However, actually fine-grained state TTL shouldalready be possible today. Maybe this is untested yet, but we largelyreworked how configuration works within the planner in Flink 1.15.

As you quickly mentioned in the FLIP, ExecNodeConfig[1] already combinesconfiguration coming from TableConfig with per-ExecNode config.Actually, state TTL from JSON plan should already have higher precedencethan TableConfig.

It would be great to extend the meta-information of ExecNodes with stateinsights. I don't fully understand where your proposed StateMetadata islocated? Would this be a value of @ExecNodeMetadata, StreamExecNode, orTwoInputStreamOperator?

I think it should be a combination of ExecNodeMetadata with roughestimates (declaration) or StreamExecNode. But should not bubble intoTwoInputStreamOperator.


Regards,
Timo

[1]https://github.com/apache/flink/blob/master/flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/plan/nodes/exec/ExecNodeConfig.java


On 03.04.23 09:15, godfrey he wrote:

Hi Jane,

Thanks for driving this FLIP.

I think the compiled plan solution and the hint solution do not
conflict, the two can exist at the same time.
The compiled plan solution can address the need of advanced users and
the platform users
which all stateful operators' state TTL can be defined by user. While
the hint solution can address some
  specific simple scenarios, which is very user-friendly, convenient,
and unambiguous to use.

Some stateful operators are not compiled from SQL directly, such as
ChangelogNormalize and
SinkUpsertMaterializer mentioned above,  I notice the the example given by Yisha
has hints propagation problem which does not conform to the current design.
The rough idea about the hint solution should be simple (only the
common operators are supported)
and easy to understand (no hints propagation).

If the hint solution is supported, a compiled plan which is from a
query with state TTL hints
  can also be further modified for the state TTL parts.

So, I prefer the hint solution to be discuss in a separate FLIP.  I
think that FLIP maybe
need a lot discussion.

Best,
Godfrey

周伊莎 <[email protected]> 于2023年3月30日周四 22:04写道：


Hi Jane,

Thanks for your detailed response.

You mentioned that there are 10k+ SQL jobs in your production

environment, but only ~100 jobs' migration involves plan editing. Is 10k+
the number of total jobs, or the number of jobs that use stateful
computation and need state migration?


10k is the number of SQL jobs that enable periodic checkpoint. And
surely if users change their sql which result in changes of the plan, they
need to do state migration.

- You mentioned that "A truth that can not be ignored is that users

usually tend to give up editing TTL(or operator ID in our case) instead of
migrating this configuration between their versions of one given job." So
what would users prefer to do if they're reluctant to edit the operator
ID? Would they submit the same SQL as a new job with a higher version to
re-accumulating the state from the earliest offset?



You're exactly right. People will tend to re-accumulate the state from a
given offset by changing the namespace of their checkpoint.
Namespace is an internal concept and restarting the sql job in a new
namespace can be simply understood as submitting a new job.

Back to your suggestions, I noticed that FLIP-190 [3] proposed the

following syntax to perform plan migration



The 'plan migration'  I said in my last reply may be inaccurate.  It's more
like 'query evolution'. In other word, if a user submitted a sql job with a
configured compiled plan, and then
he changes the sql,  the compiled plan changes too, how to move the
configuration in the old plan to the new plan.
IIUC, FLIP-190 aims to solve issues in flink version upgrades and leave out
the 'query evolution' which is a fundamental change to the query. E.g.
adding a filter condition, a different aggregation.
And I'm really looking forward to a solution for query evolution.

And I'm also curious about how to use the hint

approach to cover cases like

- configuring TTL for operators like ChangelogNormalize,
SinkUpsertMaterializer, etc., these operators are derived by the planner
implicitly
- cope with two/multiple input stream operator's state TTL, like join,
and other operations like row_number, rank, correlate, etc.



  Actually, in our company , we make operators in the query block where the
hint locates all affected by that hint. For example,

INSERT INTO sink

SELECT /*+ STATE_TTL('1D') */
    id,
    name,
    num
FROM (
    SELECT
        *,
        ROW_NUMBER() OVER (PARTITION BY id ORDER BY num DESC) as row_num
    FROM (
        SELECT
            *
        FROM (
            SELECT
                id,
                name,
                max(num) as num
            FROM source1
            GROUP BY
                id, name, TUMBLE(proc, INTERVAL '1' MINUTE)
        )
        GROUP BY
            id, name, num
    )
)
WHERE row_num = 1


In the SQL above, the state TTL of Rank and Agg will be all configured as 1
day.  If users want to set different TTL for Rank and Agg, they can just
make these two queries located in two different query blocks.
It looks quite rough but straightforward enough.  For each side of join
operator, one of my users proposed a syntax like below:

/*+ 
JOIN_TTL('tables'='left_talbe,right_table','left_ttl'='100000','right_ttl'='10000')
 */

We haven't accepted this proposal now, maybe we could find some better

design for this kind of case. Just for your information.

I think if we want to utilize hints to support fine-grained configuration,
we can open a new FLIP to discuss it.
BTW, personally, I'm interested in how to design a graphical interface to
help users to maintain their custom fine-grained configuration between
their job versions.

Best regards,
Yisha

Re: [External] [DISCUSS] FLIP-292: Support configuring state TTL at operator level for Table API & SQL programs

Reply via email to