Yinsheng Wang created HUDI-9308:
-----------------------------------
Summary: Fix incorrect usage of
mapreduce.input.fileinputformat.split.maxsize in HoodieCombineHiveInputFormat
Key: HUDI-9308
URL: https://issues.apache.org/jira/browse/HUDI-9308
Project: Apache Hudi
Issue Type: Bug
Reporter: Yinsheng Wang
“mapreduce.input.fileinputformat.split.maxsize” is used to control the split
size. When using CombineInputFormat, it is used to control the size of
CombineSplit. But in HoodieCombineHiveInputFormat, it is used to control how
many original splits are combined into one CombineSplit.
{code:java}
// code placeholder
int counter = 0;
for (int pos = 0; pos < splits.length; pos++) {
if (counter == maxSize - 1 || pos == splits.length - 1) {
builder.addSplit((FileSplit) splits[pos]);
combineFileSplits.add(builder.build(job));
builder = new HoodieCombineRealtimeFileSplit.Builder();
counter = 0;
} else if (counter < maxSize) {
counter++;
builder.addSplit((FileSplit) splits[pos]);
}
}{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)