xiarixiaoyao commented on a change in pull request #3203:
URL: https://github.com/apache/hudi/pull/3203#discussion_r725744147
##########
File path:
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/RealtimeCompactedRecordReader.java
##########
@@ -52,16 +49,19 @@
protected final RecordReader<NullWritable, ArrayWritable> parquetReader;
private final Map<String, HoodieRecord<? extends HoodieRecordPayload>>
deltaRecordMap;
- private final Set<String> deltaRecordKeys;
private int recordKeyIndex =
HoodieInputFormatUtils.HOODIE_RECORD_KEY_COL_POS;
- private Iterator<String> deltaItr;
+ private Option<RealTimeMergedRecordReader> logReader;
public RealtimeCompactedRecordReader(RealtimeSplit split, JobConf job,
RecordReader<NullWritable, ArrayWritable> realReader) throws IOException
{
super(split, job);
this.parquetReader = realReader;
- this.deltaRecordMap = getMergedLogRecordScanner().getRecords();
- this.deltaRecordKeys = new HashSet<>(this.deltaRecordMap.keySet());
+ HoodieMergedLogRecordScanner hoodieMergedLogRecordScanner =
getMergedLogRecordScanner();
+ this.deltaRecordMap = hoodieMergedLogRecordScanner.getRecords();
+ this.logReader = Option.empty();
+ if (FSUtils.isLogFile(split.getPath())) {
Review comment:
yes, A RealtimeSplit can be any of the 3 right. but we only add the
processing logic to deal with case c ; since case a and case b will be handled
correctly by original processing logic。
only case c , FSutils.isLogFile will return true. case a and case b use
the original processing logic
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]