[GitHub] [hudi] aditiwari01 opened a new issue #2802: Hive read issues when different partition have different schemas.

GitBox Sat, 10 Apr 2021 05:07:00 -0700


aditiwari01 opened a new issue #2802:
URL: https://github.com/apache/hudi/issues/2802



   Hive reads writer schema separately for each partition. If a schema has 
evolved and updates has not made for all partitions (i.e. for some partition 
last change was from older schema), they hive read for that partition would 
fail since non availability of new column in schema.
   
   Concerned code: (Class: `AbstractRealtimeRecordReader`)
   
   ```
   private void init() throws IOException {
       Schema schemaFromLogFile = 
LogReaderUtils.readLatestSchemaFromLogFiles(split.getBasePath(), 
split.getDeltaLogPaths(), jobConf);
       if (schemaFromLogFile == null) {
         writerSchema = InputSplitUtils.getBaseFileSchema((FileSplit)split, 
jobConf);
         LOG.info("Writer Schema From Parquet => " + writerSchema.getFields());
       } else {
         writerSchema = schemaFromLogFile;
         LOG.info("Writer Schema From Log => " + writerSchema.toString(true));
       }
   ```
   
   I tried replacing this writer schema to get schema from TableSchemaResolver. 
This is working fine for me. 
   
   Does this look good? I haven't followed hive folw in detail yet.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] aditiwari01 opened a new issue #2802: Hive read issues when different partition have different schemas.

Reply via email to