[
https://issues.apache.org/jira/browse/FLINK-31240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17726219#comment-17726219
]
Dong Lin commented on FLINK-31240:
----------------------------------
Merged to master branch 6b6df3db466d6a030d5a38ec786ac3297cb41c38.
> Reduce the overhead of conversion between DataStream and Table
> --------------------------------------------------------------
>
> Key: FLINK-31240
> URL: https://issues.apache.org/jira/browse/FLINK-31240
> Project: Flink
> Issue Type: Improvement
> Components: Table SQL / API
> Reporter: Jiang Xin
> Assignee: Yunfeng Zhou
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.18.0
>
>
> In some cases, users may need to convert the underlying DataStream to Table
> and then convert it back to DataStream(e.g. some Flink ML libraries accept a
> Table as input and convert it to DataStream for calculation.). This would
> cause unnecessary overhead because of data conversion between the internal
> data type and the external data type.
> We can reduce the overhead by checking if there are paired
> `fromDataStream`/`toDataStream` function call without any transformation, if
> so using the source datastream directly.
> The performance of Flink ML's Bucketizer algorithm[1] is used to demonstrate
> the impact of this optimization. The execution time is obtained by taking the
> median execution time across 5 runs for each setup.
> Before optimization: 40746ms
> After optimization: 12972ms
> Thus this optimization reduces the total execution time of Flink ML's
> Bucketizer algorithm to about 1/3.
> [1]
> https://github.com/apache/flink-ml/blob/master/flink-ml-benchmark/src/main/resources/bucketizer-benchmark.json
--
This message was sent by Atlassian Jira
(v8.20.10#820010)