yihua commented on a change in pull request #4782:
URL: https://github.com/apache/hudi/pull/4782#discussion_r806192810



##########
File path: 
hudi-common/src/main/java/org/apache/hudi/common/table/TableSchemaResolver.java
##########
@@ -115,6 +117,21 @@ private MessageType getTableParquetSchemaFromDataFile() {
     }
   }
 
+  private MessageType readSchemaFromBaseFile(String filePath) throws 
IOException {
+    if (filePath.contains(HoodieFileFormat.PARQUET.getFileExtension())) {
+      // this is a parquet file
+      return readSchemaFromParquetBaseFile(new Path(filePath));
+    } else if (filePath.contains(HoodieFileFormat.HFILE.getFileExtension())) {
+      // this is a HFile
+      return readSchemaFromHFileBaseFile(new Path(filePath));
+    } else if (filePath.contains(HoodieFileFormat.ORC.getFileExtension())) {
+      // this is a ORC file
+      return readSchemaFromORCBaseFile(new Path(filePath));
+    } else {
+      throw new IllegalArgumentException("Unknown base file format :" + 
filePath);
+    }

Review comment:
       I was more towards using the following pattern for the branch-offs so 
that we can avoid having specific formats here, which makes adding a new format 
easier (only requires changes in BaseFileUtils):
   ```
   BaseFileUtils.getInstance(filePath).readAvroSchema(conf, filePath)
   ```
   
   However, I see HFile format is not included in `BaseFileUtils`.  Maybe this 
is fine now.  @zhangyue19921010 could you add a ticket for fixing that as a 
follow-up?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to