[
https://issues.apache.org/jira/browse/FLINK-14676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972014#comment-16972014
]
Jark Wu commented on FLINK-14676:
---------------------------------
Hi [~lzljs3620320], I'm not saying set parallelism of DataStream of
StreamTableSource by framework. I suggested let connectors to configure
parallelism itself. I don't know whether it can solve your problem, because you
didn't mention the background requirment in the JIRA. What I want to avoid is
introducing some temporary APIs.
As you said, parallelism inference framework was reverted before because we
missed something. IMO, parallelism inference is a big topic and should be
designed throughly. I also have some confusion about the parallelism inference
for InputFormatTableSource (according to the PR):
1) why the parallelism is inferred by row_count/rows_per_partition? What if the
rowCount is empty or wrong? How to guarantee each partition process such number
of rows? And what if it is not a partitioned source?
2) the configuration is not applied to streaming mode. This may diverge stream
and batch.
3) I think a more intuitive way is exposing configuration to set source
parallelism directly. How to cooperate it with the rows_per_partition
configuration?
> Introduce parallelism inference for InputFormatTableSource
> ----------------------------------------------------------
>
> Key: FLINK-14676
> URL: https://issues.apache.org/jira/browse/FLINK-14676
> Project: Flink
> Issue Type: New Feature
> Components: Table SQL / Planner
> Reporter: Jingsong Lee
> Assignee: Jingsong Lee
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.10.0
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> FLINK-12801 has introduce parallelism setting for table, but because
> TableSource generate DataStream, maybe DataStream is not a real source, that
> will lead to some shuffle errors. So FLINK-13494 remove these implementations.
> In this ticket, I would like to introduce parallelism inference only for
> InputFormatTableSource, the RowCount of InputFormatTableSource is more
> accurate than downstream stages. It is worth to automatically generate its
> parallelism.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)