Yinsheng Wang created HUDI-9308:
-----------------------------------

             Summary: Fix incorrect usage of 
mapreduce.input.fileinputformat.split.maxsize in HoodieCombineHiveInputFormat
                 Key: HUDI-9308
                 URL: https://issues.apache.org/jira/browse/HUDI-9308
             Project: Apache Hudi
          Issue Type: Bug
            Reporter: Yinsheng Wang


“mapreduce.input.fileinputformat.split.maxsize” is used to control the split 
size. When using CombineInputFormat, it is used to control the size of 
CombineSplit. But in HoodieCombineHiveInputFormat, it is used to control how 
many original splits are combined into one CombineSplit.
{code:java}
// code placeholder
int counter = 0;
for (int pos = 0; pos < splits.length; pos++) {
  if (counter == maxSize - 1 || pos == splits.length - 1) {
    builder.addSplit((FileSplit) splits[pos]);
    combineFileSplits.add(builder.build(job));
    builder = new HoodieCombineRealtimeFileSplit.Builder();
    counter = 0;
  } else if (counter < maxSize) {
    counter++;
    builder.addSplit((FileSplit) splits[pos]);
  }
}{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to