aditiwari01 opened a new issue #2802:
URL: https://github.com/apache/hudi/issues/2802
Hive reads writer schema separately for each partition. If a schema has
evolved and updates has not made for all partitions (i.e. for some partition
last change was from older schema), they hive read for that partition would
fail since non availability of new column in schema.
Concerned code: (Class: `AbstractRealtimeRecordReader`)
```
private void init() throws IOException {
Schema schemaFromLogFile =
LogReaderUtils.readLatestSchemaFromLogFiles(split.getBasePath(),
split.getDeltaLogPaths(), jobConf);
if (schemaFromLogFile == null) {
writerSchema = InputSplitUtils.getBaseFileSchema((FileSplit)split,
jobConf);
LOG.info("Writer Schema From Parquet => " + writerSchema.getFields());
} else {
writerSchema = schemaFromLogFile;
LOG.info("Writer Schema From Log => " + writerSchema.toString(true));
}
```
I tried replacing this writer schema to get schema from TableSchemaResolver.
This is working fine for me.
Does this look good? I haven't followed hive folw in detail yet.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]