nsivabalan commented on a change in pull request #4468:
URL: https://github.com/apache/hudi/pull/4468#discussion_r803019198
##########
File path:
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/AbstractRealtimeRecordReader.java
##########
@@ -77,19 +74,17 @@ private boolean usesCustomPayload() {
}
/**
- * Goes through the log files in reverse order and finds the schema from the
last available data block. If not, falls
+ * Gets schema from HoodieTableMetaClient. If not, falls
* back to the schema from the latest parquet file. Finally, sets the
partition column and projection fields into the
* job conf.
*/
- private void init() throws IOException {
- Schema schemaFromLogFile =
LogReaderUtils.readLatestSchemaFromLogFiles(split.getBasePath(),
split.getDeltaLogFiles(), jobConf);
- if (schemaFromLogFile == null) {
- writerSchema = InputSplitUtils.getBaseFileSchema((FileSplit)split,
jobConf);
- LOG.info("Writer Schema From Parquet => " + writerSchema.getFields());
- } else {
- writerSchema = schemaFromLogFile;
- LOG.info("Writer Schema From Log => " + writerSchema.toString(true));
- }
+ private void init() throws Exception {
+
+ HoodieTableMetaClient metaClient =
HoodieTableMetaClient.builder().setConf(split.getPath().getFileSystem(jobConf).getConf()).setBasePath(split.getBasePath()).build();
+ TableSchemaResolver schemaUtil = new TableSchemaResolver(metaClient);
Review comment:
I checked the latest master and code has evolved due to key dedup for
metadata and other requirements. So, already we have metaClient in
AbstractHoodieLogRecordReader. Can you rebase w/ latest master and revisit the
actual fix. may be the fix has to be done at higher layers, bcoz, I see the
schema resolution is not done in this class (w/ latest master)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]