[jira] [Comment Edited] (FLINK-16903) Add sink.parallelism for file system factory

Jingsong Lee (Jira) Wed, 01 Apr 2020 04:29:39 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-16903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17072665#comment-17072665
 ]


Jingsong Lee edited comment on FLINK-16903 at 4/1/20, 11:28 AM:
----------------------------------------------------------------

[~twalthr] Good point. I got it. Pause this Jira now.

We could provide this ability in planner instead of connector. But maybe 
connectors also want to take participate in parallelism inference. We need 
think whole story about resource management topic:
 # The user needs to be able to specify the parallelism of a source / sink.
 # Source can take participate in parallelism inference. like Kafka, common 
case is that parallelism is inferred by partitions. like Hive/FileSystem, 
common case is that parallelism is inferred by split number.


was (Author: lzljs3620320):
[~twalthr] Good point. I got it. Pause this Jira now.

We could provide this ability in planner instead of connector. But maybe 
connectors also want to take participate in parallelism inference. We need 
think whole story about resource management topic:
 # The user needs to be able to specify the parallelism of a source / sink.

 # Source can take participate in parallelism inference. like Kafka, common 
case is that parallelism is inferred by partitions. like Hive/FileSystem, 
common case is that parallelism is inferred by split number.

 

> Add sink.parallelism for file system factory
> --------------------------------------------
>
>                 Key: FLINK-16903
>                 URL: https://issues.apache.org/jira/browse/FLINK-16903
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Table SQL / Runtime
>            Reporter: Jingsong Lee
>            Priority: Major
>             Fix For: 1.11.0
>
>
> A single task may be writing multiple files at the same time. If the 
> parallelism is too high, it may lead to a large number of small files. If the 
> parallelism is too small, the performance is not enough. This requires that 
> the user can specify parallelism.
>  * Default is the same as upstream transformation
>  * Users can specify parallelism too.
> |‘connector.sink.parallelism’ = ...|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (FLINK-16903) Add sink.parallelism for file system factory

Reply via email to