Davis-Zhang-Onehouse commented on code in PR #12776:
URL: https://github.com/apache/hudi/pull/12776#discussion_r1945959091
##########
hudi-hadoop-mr/src/test/java/org/apache/hudi/hadoop/realtime/TestHoodieRealtimeRecordReader.java:
##########
@@ -619,21 +622,19 @@ public void
testSchemaEvolutionAndRollbackBlockInLastLogFile(ExternalSpillableMa
@Test
public void testSchemaEvolution() throws Exception {
- ExternalSpillableMap.DiskMapType diskMapType =
ExternalSpillableMap.DiskMapType.BITCASK;
- boolean isCompressionEnabled = true;
+ HoodieTableMetaClient metaClient =
HoodieTestUtils.init(basePath.toString(), HoodieTableType.MERGE_ON_READ);
+ HoodieTestTable table = HoodieTestTable.of(metaClient);
Review Comment:
Missed you in today's standup, I can update here:
- Back at the time we implement RFC82 in hudi internal, we didn't aware of
this alternative as it's naming is not indicating it contains "table schema".
- As part of RFC 82, in case there are concurrent schema evolution, the
protocol does not go through InternalSchemaCache either. At validation phase,
we resolve the new table schema and write the updated table schema in the
commit metadata.
With RFC 82 what is ensured is
- for commit (COW), delta commit (MOR) and replacement commit (for both),
it's commit metadata if contains schema field, it is tracking the table schema
at that time.
- Table schema resolver now returns the "table schema" with lower cost and
decreased overhead. It avoids saving commit metadata but directly save the
schema to reduce memory pressure, same caching behavior as before and no extra
overhead (for filtering clustering it just check the request instant file name
without parsing commit metadata)
For more information, please take a look at the RFC and the
https://github.com/apache/hudi/pull/12781. Happy to chat over a meeting as well.
I got your point that there are some alternative system which may intend to
achieve similar things. From the perspective of getting the RFC 82 ready in OSS
in the most straight-forward way, is there any major concerns that we should
block the improvement introduced here?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]