xiarixiaoyao commented on a change in pull request #2721:
URL: https://github.com/apache/hudi/pull/2721#discussion_r601949535
##########
File path:
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/RealtimeCompactedRecordReader.java
##########
@@ -95,15 +103,24 @@ public boolean next(NullWritable aVoid, ArrayWritable
arrayWritable) throws IOEx
// TODO(NA): Invoke preCombine here by converting arrayWritable to
Avro. This is required since the
// deltaRecord may not be a full record and needs values of columns
from the parquet
Option<GenericRecord> rec;
- if (usesCustomPayload) {
- rec =
deltaRecordMap.get(key).getData().getInsertValue(getWriterSchema());
- } else {
- rec =
deltaRecordMap.get(key).getData().getInsertValue(getReaderSchema());
+ rec = buildGenericRecordwithCustomPayload(deltaRecordMap.get(key));
+ // If the record is not present, this is a delete record using an
empty payload so skip this base record
+ // and move to the next record
+ while (!rec.isPresent()) {
+ // if current parquet reader has no record, return false
+ if (!this.parquetReader.next(aVoid, arrayWritable)) {
Review comment:
@garyli1019 thanks for your replay. no, we will not miss a
record. when we call this.parquetReader.next(aVoid, arrayWritable) and this
function return true, parquet reader wil fill the new record to arrayWritable
auto(see parquet reader source code),
source code of parquet reader:
public boolean next(final NullWritable key, final ArrayWritable value)
throws IOException {
************************
****************
if (value != null && arrValue.length == arrCurrent.length) {
****System.arraycopy(arrCurrent, 0, arrValue, 0,
arrCurrent.length);****
} else {
if (arrValue.length != arrCurrent.length) {
throw new IOException("DeprecatedParquetHiveInput : size of
object differs. Value" +
" size : " + arrValue.length + ", Current Object size : " +
arrCurrent.length);
} else {
throw new IOException("DeprecatedParquetHiveInput can not
support RecordReaders that" +
" don't return same key & value & value is null");
}
}
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]