[jira] [Commented] (FLINK-26348) Maybe ChangelogNormalize should ignore unused columns when deduplicate

Jark Wu (Jira) Thu, 24 Feb 2022 02:22:08 -0800


    [ 
https://issues.apache.org/jira/browse/FLINK-26348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17497317#comment-17497317
 ]


Jark Wu commented on FLINK-26348:
---------------------------------

+1 for this. The project push down should also be able to pass through 
ChangelogNormalize.

> Maybe ChangelogNormalize should ignore unused columns when deduplicate
> ----------------------------------------------------------------------
>
>                 Key: FLINK-26348
>                 URL: https://issues.apache.org/jira/browse/FLINK-26348
>             Project: Flink
>          Issue Type: Improvement
>    Affects Versions: 1.13.2
>            Reporter: Kenyore
>            Priority: Major
>
> In my case I have tables below
>  * sku(size:1K+)
>  * custom_product(size:10B+)
>  * order(size:100M+)
> And my sql is like
> {code:sql}
> SELECT o.code,o.created,s.sku_name,p.product_name FROM order o 
>     INNER JOIN custom_product p ON o.p_id=p.id
>     INNER JOIN sku s ON s.id=p.s_id
> {code}
> Table sku has some other columns.
> The problem is that when another column(be like description) in any row of 
> table sku changes,flink may produce millions of update rows whitch is useless 
> in downstream.Because we only pick column sku_name in the downstream,but the 
> change is column description.
> This kind of useless update row would bring pressure to downstream operators.
> I think it is significant for flink to improve this.thks



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (FLINK-26348) Maybe ChangelogNormalize should ignore unused columns when deduplicate

Reply via email to