Re: [PR] [HUDI-8219] Introduce new Table schema resolver [hudi]

via GitHub Thu, 06 Feb 2025 18:05:59 -0800


danny0405 commented on code in PR #12776:
URL: https://github.com/apache/hudi/pull/12776#discussion_r1945860776



##########
hudi-hadoop-mr/src/test/java/org/apache/hudi/hadoop/realtime/TestHoodieRealtimeRecordReader.java:
##########
@@ -619,21 +622,19 @@ public void 
testSchemaEvolutionAndRollbackBlockInLastLogFile(ExternalSpillableMa
 
   @Test
   public void testSchemaEvolution() throws Exception {
-    ExternalSpillableMap.DiskMapType diskMapType = 
ExternalSpillableMap.DiskMapType.BITCASK;
-    boolean isCompressionEnabled = true;
+    HoodieTableMetaClient metaClient = 
HoodieTestUtils.init(basePath.toString(), HoodieTableType.MERGE_ON_READ);
+    HoodieTestTable table = HoodieTestTable.of(metaClient);

Review Comment:
   > This PR address the second point as today's table schema resolver does not 
work properly in multi-writer scenario and 
https://github.com/apache/hudi/pull/12781 addresses the first point.
   
   One thing I want to point out is the `TableSchemaResolver` is mostly 
designed for read path instead of writer path, `InternalSchemaCache` is the 
component you should use for writer path schema evolution.
   
   For each writer, it holds the writer schema in config already, what you need 
is the current schema, you can just query it from `InternalSchemaCache` and set 
it in the commit metadata? Maybe you can elaborate a little more for each 
writer what schemas we need and where you want to store them.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [HUDI-8219] Introduce new Table schema resolver [hudi]

Reply via email to