[GitHub] [hudi] xiarixiaoyao commented on a change in pull request #4468: [Issue: #2802] Fixing Hive getSchema for RT tables

GitBox Wed, 29 Dec 2021 18:07:40 -0800


xiarixiaoyao commented on a change in pull request #4468:
URL: https://github.com/apache/hudi/pull/4468#discussion_r776545682




##########
File path: 
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/AbstractRealtimeRecordReader.java
##########
@@ -77,19 +74,17 @@ private boolean usesCustomPayload() {
   }
 
   /**
-   * Goes through the log files in reverse order and finds the schema from the 
last available data block. If not, falls
+   * Gets schema from HoodieTableMetaClient. If not, falls
    * back to the schema from the latest parquet file. Finally, sets the 
partition column and projection fields into the
    * job conf.
    */
-  private void init() throws IOException {
-    Schema schemaFromLogFile = 
LogReaderUtils.readLatestSchemaFromLogFiles(split.getBasePath(), 
split.getDeltaLogFiles(), jobConf);
-    if (schemaFromLogFile == null) {
-      writerSchema = InputSplitUtils.getBaseFileSchema((FileSplit)split, 
jobConf);
-      LOG.info("Writer Schema From Parquet => " + writerSchema.getFields());
-    } else {
-      writerSchema = schemaFromLogFile;
-      LOG.info("Writer Schema From Log => " + writerSchema.toString(true));
-    }
+  private void init() throws Exception {
+
+    HoodieTableMetaClient metaClient = 
HoodieTableMetaClient.builder().setConf(split.getPath().getFileSystem(jobConf).getConf()).setBasePath(split.getBasePath()).build();
+    TableSchemaResolver schemaUtil = new TableSchemaResolver(metaClient);

Review comment:
       @nsivabalan  We found this problem a long time ago， this fix can really 
solve this problem，
    but i notice  an additional comment in this funcition：    // TODO(vc): In 
the future, the reader schema should be updated based on log files & be able
   vc may have some extra considerations for this problem。
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] xiarixiaoyao commented on a change in pull request #4468: [Issue: #2802] Fixing Hive getSchema for RT tables

Reply via email to