This is an automated email from the ASF dual-hosted git repository.

zhangyue19921010 pushed a commit to branch rfc-103
in repository https://gitbox.apache.org/repos/asf/hudi.git

commit 41ee3d5b40b499317c6512cf0b5404b2560a05d9
Author: YueZhang <[email protected]>
AuthorDate: Tue Jan 13 20:03:33 2026 +0800

    add details for parquet log format
---
 rfc/rfc-103/rfc-103.md | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/rfc/rfc-103/rfc-103.md b/rfc/rfc-103/rfc-103.md
index 2938b63f19d1..b3c250dfa272 100644
--- a/rfc/rfc-103/rfc-103.md
+++ b/rfc/rfc-103/rfc-103.md
@@ -156,6 +156,19 @@ These features amplify the benefits of the LSM layout but 
are not strictly requi
 - Better compression than Avro
 - Support pruning during reads
 
+Switching log file format from Avro to Parquet requires the following changes:
+
+0. Parquet log file naming format should remain consistent with existing Avro 
logs to ensure compatibility with existing MOR tables
+1. **Writer changes**: Block append operations are no longer supported. During 
writes, input data is sorted and deduplicated, then written directly to new 
Parquet files using a Create handler:
+   - For **Spark**: reuse the bulk insert write logic
+   - For **Flink**: refactor the upsert write logic. Data preparation, 
metadata field addition, and sorting logic can be reused, but the final write 
should use the Parquet Create Handler to write new Parquet log files
+2. **Reader changes**: When reading Parquet log files, skip the logic for 
handling delete blocks and damaged blocks. Read data directly using the Parquet 
Log Reader, enabling optimizations such as vectorized reads and column pruning
+3. **Markers**: Implement a new MOR marker write mechanism. Create markers are 
written during writes, similar to COW create markers
+4. **Rollback**: Handle both Marker-Based Rollback and Listing-Based Rollback 
scenarios:
+   - For MOR Parquet logs, damaged files are deleted directly (similar to COW)
+5. **Cleaning**: MOR Parquet log file cleaning directly deletes the 
corresponding Parquet log files (similar to COW Parquet)
+
+
 **Behavior changes**
 
 - MOR **rollback** deletes Parquet log files directly, instead of appending a 
delete block.

Reply via email to