[
https://issues.apache.org/jira/browse/FLINK-29861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
luoyuxia updated FLINK-29861:
-----------------------------
Description:
Currently, Hive source will call ` InputFormat#getSplits(JobConf conf, int
minNumSplits)` to enumerate splits. The minNumSplits passed is the source's
parallelism.
For text format, the splits may be too many and each split contains much less
data which is time costly for reader to get the split and do the reading.
We may need to revisit the logic for enumerating splits.
was:Currently, Hive source will call to emu
> Optimize logic of enumerateSplits in HiveSource
> ------------------------------------------------
>
> Key: FLINK-29861
> URL: https://issues.apache.org/jira/browse/FLINK-29861
> Project: Flink
> Issue Type: Sub-task
> Components: Connectors / Hive
> Reporter: luoyuxia
> Priority: Major
>
> Currently, Hive source will call ` InputFormat#getSplits(JobConf conf, int
> minNumSplits)` to enumerate splits. The minNumSplits passed is the source's
> parallelism.
> For text format, the splits may be too many and each split contains much less
> data which is time costly for reader to get the split and do the reading.
> We may need to revisit the logic for enumerating splits.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)