prashantwason commented on a change in pull request #1687:
URL: https://github.com/apache/hudi/pull/1687#discussion_r435628047
##########
File path:
hudi-client/src/main/java/org/apache/hudi/table/action/commit/CommitActionExecutor.java
##########
@@ -89,11 +87,12 @@ public CommitActionExecutor(JavaSparkContext jsc,
throw new HoodieUpsertException(
"Error in finding the old file path at commit " + instantTime + "
for fileId: " + fileId);
} else {
- AvroReadSupport.setAvroReadSchema(table.getHadoopConf(),
upsertHandle.getWriterSchema());
BoundedInMemoryExecutor<GenericRecord, GenericRecord, Void> wrapper =
null;
- try (ParquetReader<IndexedRecord> reader =
-
AvroParquetReader.<IndexedRecord>builder(upsertHandle.getOldFilePath()).withConf(table.getHadoopConf()).build())
{
- wrapper = new SparkBoundedInMemoryExecutor(config, new
ParquetReaderIterator(reader),
+ try {
+ HoodieStorageReader<IndexedRecord> storageReader =
+ HoodieStorageReaderFactory.getStorageReader(table.getHadoopConf(),
upsertHandle.getOldFilePath());
+ wrapper =
+ new SparkBoundedInMemoryExecutor(config,
storageReader.getRecordIterator(upsertHandle.getWriterSchema()),
Review comment:
This is just creating a ParquetReader and getting an iterator to read
all the records. The entire record will need to be read here as we are merging.
I didn't understand how predicates are applicable here. Are they not in the
InputFormat?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]