[jira] [Updated] (FLINK-29861) Optimize logic of enumerateSplits in HiveSource

luoyuxia (Jira) Thu, 03 Nov 2022 01:04:08 -0700


     [ 
https://issues.apache.org/jira/browse/FLINK-29861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


luoyuxia updated FLINK-29861:
-----------------------------
    Description: 
Currently, Hive source will call ` InputFormat#getSplits(JobConf conf, int 
minNumSplits)` to enumerate splits. The minNumSplits passed is the source's 
parallelism. 

For text format, the splits may be too many and each split contains much less 
data which is time costly for reader to get the split and do the reading.

We may need to revisit the logic for enumerating splits.

  was:Currently, Hive source will call to emu


> Optimize logic of enumerateSplits in HiveSource 
> ------------------------------------------------
>
>                 Key: FLINK-29861
>                 URL: https://issues.apache.org/jira/browse/FLINK-29861
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Connectors / Hive
>            Reporter: luoyuxia
>            Priority: Major
>
> Currently, Hive source will call ` InputFormat#getSplits(JobConf conf, int 
> minNumSplits)` to enumerate splits. The minNumSplits passed is the source's 
> parallelism. 
> For text format, the splits may be too many and each split contains much less 
> data which is time costly for reader to get the split and do the reading.
> We may need to revisit the logic for enumerating splits.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-29861) Optimize logic of enumerateSplits in HiveSource

Reply via email to