Re: [I] New LSM file layout [hudi]

via GitHub Tue, 25 Nov 2025 00:41:14 -0800


zhangyue19921010 commented on issue #14310:
URL: https://github.com/apache/hudi/issues/14310#issuecomment-3574401804


   # Hudi LSM Benchmark Test Report
   
   **Overview**: Based on the architectural concept of LSM Tree (sequential 
writing + layered merging), this report improves the data organization protocol 
for Hudi Merge-on-Read (MoR) table scenarios and replaces the original Avro 
with Parquet as the format for individual Log files to optimize the MoR 
performance of lake tables.
   
   # I. Background
   
   With the in-depth implementation of data lakes within the company, an 
increasing number of business scenarios have begun to utilize the MoR feature 
of lake tables. However, some core business scenarios (e.g., JD's traffic data 
warehouse which features high update frequency and large data volume, with a 
single scenario generating up to billion-level daily updated data) have higher 
performance requirements for MoR. Therefore, the update efficiency, Compaction 
performance, and read-time merging capability of lake table MoR all need to 
reach an extremely high level. 
   
   To address the above challenges, we have designed and implemented the 
complete Hudi-LSM capability. Leveraging the core advantages of LSM's 
sequential reading/writing and layered merging, we have carried out 
comprehensive solution design and optimization for the core modules of Hudi, 
including reading, writing, TableService, and Common, thereby meeting JD's more 
complex and diverse data business scenarios.
   
   Therefore, to evaluate the performance advantages of the optimized 
Hudi-MoR-LSM (hereinafter referred to as LSM) compared with Apache Paimon-1.0.1 
(hereinafter referred to as Paimon) and Hudi-MoR-Avro (hereinafter referred to 
as Avro 0.13.1 ), we conducted a series of comparative tests. The tests aim to 
simulate the workload and data scale in real business scenarios to more 
accurately reflect the performance differences in practical applications. The 
test results will provide customized reference for business, guide the 
continuous optimization of the Hudi engine, and establish accurate performance 
benchmarks for relevant scenarios.
   
   # II. Benchmark Tests
   
   ## 1. Test Conclusion
   
   This benchmark test is based on multiple types of datasets and table 
structures, including TPCDS, Nexmark. A total of 33 test cases were completed, 
covering five core business scenarios: stream reading, stream writing, batch 
reading, batch writing, and Compaction. In this round of tests, Hudi-LSM were 
compared with Avro (0.13.1) and Paimon (1.0.1) respectively
   
   
![Image](https://p3-flow-imagex-sign.byteimg.com/tos-cn-i-a9rns2rl98/bd7b597f9d464b159c9dc8eac4e68a77.png~tplv-noop.jpeg?rk3s=49177a0b&x-expires=1764063485&x-signature=u8dUts4%2FvVU0UchndBaFjKsDMXU%3D&resource_key=a91d29e7-d1ab-4761-9cf4-1c8dd5f11d6c&resource_key=a91d29e7-d1ab-4761-9cf4-1c8dd5f11d6c)
   
   *Note: The unit for performance comparison is consistent across all test 
cases.*
   
   ## 2. Test Details
   
   ### ① Writing Category
   
   **General Scenarios**: TPCDS - Single write data volume: 1,439,980,416; 
TPCDS Compaction data volume: 1,439,980,416 × 2; Nexmark - Data volume: 1 
billion.
   
   |Dataset|Writing Mode|Engine|Operation Type|Machine Specification|LSM 
MoR|Avro MoR|Paimon|LSM vs. Avro Performance Difference|LSM vs. Paimon 
Performance Difference|
   |---|---|---|---|---|---|---|---|---|---|
   |TPCDS|Batch Write|Spark|Bulk_insert|100 × 2c 8g (200 CU)|435s (16,551 
records/cu/s)|533s (13,508 records/cu/s)|450s (16,000 records/cu/s)|×1.23|×1.03|
   ||||Upsert|100 × 2c 8g (200 CU)|442s (16,289 records/cu/s)|805s (8,944 
records/cu/s)|473s (15,221 records/cu/s)|×1.82|×1.07|
   ||||Compaction|100 × 2c 8g (200 CU)|415s (17,349 records/cu/s)|528s (13,636 
records/cu/s)|244s (29,508 records/cu/s)|×1.27|×0.59|
   |Nexmark|Stream Write|Flink|Upsert + Disable Compaction|5 × 2c 8g (10 CU) 
|54m25s (132,802 records/cu/s)|25m54s (64,350 records/cu/s)|12m33s (306,274 
records/cu/s)|×4.3|×2.06|
   ||||Upsert + Enable Compaction|5 × 2c 8g (10 CU) |13m32s (123,152 
records/cu/s)|26m7s (27,292 records/cu/s)|61m4s (63,816 
records/cu/s)|×4.52|×1.93|
   |TPCDS|Compaction|Spark|Compaction|10 × 2c 8g (20 CU)|11m20s (211,761 
records/cu/s)|2h36m (156,519 records/cu/s)|15m20s (249,992 
records/cu/s)|×10.6|×0.74|
   ### ② Query Category
   
   **Data Volume**: 1,439,980,416 × 2
   
   |Data Preparation|Query Engine|Query Mode|Query Type|Query Resources|LSM 
MoR|Avro MoR|Paimon|LSM vs. Avro Performance Difference|LSM vs. Paimon 
Performance Difference|
   |---|---|---|---|---|---|---|---|---|---|
   |TPCDS <br>Initialization + Upsert|Spark|Snapshot Query|Point Query|1 CU|13s 
(119,105 records/cu/s)|48s (32,257 records/cu/s)|9s (172,040 
records/cu/s)|×3.7|×0.65|
   ||||Range Query|200 CU|83s (148,403 records/cu/s)|401s (35,910 
records/cu/s)|74s (166,452 records/cu/s)|×4.83|×0.89|
   ||||Full Table Query|200 CU|287s (50,174 records/cu/s)|651s (22,120 
records/cu/s)|164s (87,804 records/cu/s)|×2.27|×0.57|
   ||||Column Pruning|200 CU|88s (16,3634 records/cu/s)|375s (38,399 
records/cu/s)|77s (187,010 records/cu/s)|×4.26|×0.88|
   |||Incremental Query|Point Query|1 CU|32.19s (30,664 records/cu/s)|641.49s 
(22,447 records/cu/s)|11.46s (86,200 records/cu/s)|×19.93|×0.36|
   ||||Range Query|200 CU|73.47s (167,653 records/cu/s)|469.69s (30,658 
records/cu/s)|52.44s (234,887 records/cu/s)|×6.39|×0.71|
   ||||Full Table Query|200 CU|241.50s (59,627 records/cu/s)|669.83s (21,498 
records/cu/s)|154.76s (93,044 records/cu/s)|×2.77|×0.64|
   ||||Column Pruning|200 CU|89.62s (160,677 records/cu/s)|555.87s (25,905 
records/cu/s)|57.74s (249,406 records/cu/s)|×6.20|×0.64|
   ||Flink|Incremental Query|Range Query|200 CU|14min (342,852 
records/cu/s)|91min (52,746 records/cu/s)|11min (436,357 
records/cu/s)|×6.5|×0.78|
   ||||Full Table Query|200 CU|16.5min (290,905 records/cu/s)|95min (50,525 
records/cu/s)|14.5min (331,029 records/cu/s)|×5.8|×0.87|
   ||||Column Pruning|200 CU|10min (479,993 records/cu/s)|89min (53,931 
records/cu/s)|6.7min (716,408 records/cu/s)|×8.9|×0.67|
   # III. Test Plan
   
   ## 1. Basic Configuration
   
   - **Computing Engine**: Spark 3.4 for batch tasks and Flink 1.16 for stream 
tasks.
   
   - **Test Configuration**: LSM, Paimon, and Avro are set as the single 
variable for each test task, while other configurations remain consistent. For 
task parameters/configurations, see the appendix.
   
   - **Noise Elimination**: Each test case is executed 3 times. After 
eliminating abnormal results, the average execution time of each test task is 
used as the performance indicator.
   
   - **Test Environment**: Independent operating environments are built to 
eliminate interference caused by other environmental variables in the benchmark 
test.
   
   |Environment Type|Single Task Specification|Minimum Number of 
Tasks|Recommended Number of Tasks|Total Resources|
   |---|---|---|---|---|
   |Flink Stream Task Environment|124 Cores | 496 GB|3 × 124 Cores | 496 GB|5 × 
124 Cores | 496 GB|620 Cores | 2480 GB|
   |Spark Batch Task Environment|204 Cores | 816 GB|3 × 204 Cores | 816 GB|5 × 
204 Cores | 816 GB|1020 Cores | 4080 GB|
   ## 2. Resources/Machines
   
   - **Flink (Single Task)**:Nexmark: 20 × (1c 4G), default parallelism 10, 
independent slot grouping configured for write operators.
   
   - **Spark (Batch Task)**: Driver: 1 × 4c 16g; Executor: 50 × 2c 8g.
   
   ## 3. Test Datasets
   
   - **General Datasets**: Stream: Nexmark (1 billion data records)
   
   - **Batch**: TPCDS (1.44 billion data records)
   
   ## 4. Performance Calculation Formula
   
   - Batch Task Execution Time = (Sum of execution times of N single tasks with 
fixed data volume) / N
   
   - Real-time Task Execution Time = (End time of consuming fixed data volume - 
Start time) × N / N
   
   - Throughput (records/sec/CU) = Total processed data volume (records) ÷ 
Actual execution time (sec) ÷ Total consumed CU


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] New LSM file layout [hudi]

Reply via email to