xiarixiaoyao commented on a change in pull request #3203:
URL: https://github.com/apache/hudi/pull/3203#discussion_r728743555
##########
File path:
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/HoodieParquetRealtimeInputFormat.java
##########
@@ -119,6 +307,11 @@ void addProjectionToJobConf(final RealtimeSplit
realtimeSplit, final JobConf job
addProjectionToJobConf(realtimeSplit, jobConf);
LOG.info("Creating record reader with readCols :" +
jobConf.get(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR)
+ ", Ids :" +
jobConf.get(ColumnProjectionUtils.READ_COLUMN_IDS_CONF_STR));
+
+ // for log only split, we no need parquet reader, set it to empty
+ if (FSUtils.isLogFile(realtimeSplit.getPath())) {
Review comment:
agree
##########
File path:
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HoodieRealtimeInputFormatUtils.java
##########
@@ -69,17 +70,18 @@
private static final Logger LOG =
LogManager.getLogger(HoodieRealtimeInputFormatUtils.class);
public static InputSplit[] getRealtimeSplits(Configuration conf,
Stream<FileSplit> fileSplits) throws IOException {
- Map<Path, List<FileSplit>> partitionsToParquetSplits =
- fileSplits.collect(Collectors.groupingBy(split ->
split.getPath().getParent()));
+ // for all unique split parents, obtain all delta files based on delta
commit timeline,
+ // grouped on file id
+ List<InputSplit> rtSplits = new ArrayList<>();
+ List<FileSplit> candidateFileSplits =
fileSplits.collect(Collectors.toList());
+ Map<Path, List<FileSplit>> partitionsToParquetSplits = candidateFileSplits
Review comment:
ok
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]