zhangyue19921010 commented on issue #14310: URL: https://github.com/apache/hudi/issues/14310#issuecomment-3574401804
# Hudi LSM Benchmark Test Report **Overview**: Based on the architectural concept of LSM Tree (sequential writing + layered merging), this report improves the data organization protocol for Hudi Merge-on-Read (MoR) table scenarios and replaces the original Avro with Parquet as the format for individual Log files to optimize the MoR performance of lake tables. # I. Background With the in-depth implementation of data lakes within the company, an increasing number of business scenarios have begun to utilize the MoR feature of lake tables. However, some core business scenarios (e.g., JD's traffic data warehouse which features high update frequency and large data volume, with a single scenario generating up to billion-level daily updated data) have higher performance requirements for MoR. Therefore, the update efficiency, Compaction performance, and read-time merging capability of lake table MoR all need to reach an extremely high level. To address the above challenges, we have designed and implemented the complete Hudi-LSM capability. Leveraging the core advantages of LSM's sequential reading/writing and layered merging, we have carried out comprehensive solution design and optimization for the core modules of Hudi, including reading, writing, TableService, and Common, thereby meeting JD's more complex and diverse data business scenarios. Therefore, to evaluate the performance advantages of the optimized Hudi-MoR-LSM (hereinafter referred to as LSM) compared with Apache Paimon-1.0.1 (hereinafter referred to as Paimon) and Hudi-MoR-Avro (hereinafter referred to as Avro 0.13.1 ), we conducted a series of comparative tests. The tests aim to simulate the workload and data scale in real business scenarios to more accurately reflect the performance differences in practical applications. The test results will provide customized reference for business, guide the continuous optimization of the Hudi engine, and establish accurate performance benchmarks for relevant scenarios. # II. Benchmark Tests ## 1. Test Conclusion This benchmark test is based on multiple types of datasets and table structures, including TPCDS, Nexmark. A total of 33 test cases were completed, covering five core business scenarios: stream reading, stream writing, batch reading, batch writing, and Compaction. In this round of tests, Hudi-LSM were compared with Avro (0.13.1) and Paimon (1.0.1) respectively  *Note: The unit for performance comparison is consistent across all test cases.* ## 2. Test Details ### ① Writing Category **General Scenarios**: TPCDS - Single write data volume: 1,439,980,416; TPCDS Compaction data volume: 1,439,980,416 × 2; Nexmark - Data volume: 1 billion. |Dataset|Writing Mode|Engine|Operation Type|Machine Specification|LSM MoR|Avro MoR|Paimon|LSM vs. Avro Performance Difference|LSM vs. Paimon Performance Difference| |---|---|---|---|---|---|---|---|---|---| |TPCDS|Batch Write|Spark|Bulk_insert|100 × 2c 8g (200 CU)|435s (16,551 records/cu/s)|533s (13,508 records/cu/s)|450s (16,000 records/cu/s)|×1.23|×1.03| ||||Upsert|100 × 2c 8g (200 CU)|442s (16,289 records/cu/s)|805s (8,944 records/cu/s)|473s (15,221 records/cu/s)|×1.82|×1.07| ||||Compaction|100 × 2c 8g (200 CU)|415s (17,349 records/cu/s)|528s (13,636 records/cu/s)|244s (29,508 records/cu/s)|×1.27|×0.59| |Nexmark|Stream Write|Flink|Upsert + Disable Compaction|5 × 2c 8g (10 CU) |54m25s (132,802 records/cu/s)|25m54s (64,350 records/cu/s)|12m33s (306,274 records/cu/s)|×4.3|×2.06| ||||Upsert + Enable Compaction|5 × 2c 8g (10 CU) |13m32s (123,152 records/cu/s)|26m7s (27,292 records/cu/s)|61m4s (63,816 records/cu/s)|×4.52|×1.93| |TPCDS|Compaction|Spark|Compaction|10 × 2c 8g (20 CU)|11m20s (211,761 records/cu/s)|2h36m (156,519 records/cu/s)|15m20s (249,992 records/cu/s)|×10.6|×0.74| ### ② Query Category **Data Volume**: 1,439,980,416 × 2 |Data Preparation|Query Engine|Query Mode|Query Type|Query Resources|LSM MoR|Avro MoR|Paimon|LSM vs. Avro Performance Difference|LSM vs. Paimon Performance Difference| |---|---|---|---|---|---|---|---|---|---| |TPCDS <br>Initialization + Upsert|Spark|Snapshot Query|Point Query|1 CU|13s (119,105 records/cu/s)|48s (32,257 records/cu/s)|9s (172,040 records/cu/s)|×3.7|×0.65| ||||Range Query|200 CU|83s (148,403 records/cu/s)|401s (35,910 records/cu/s)|74s (166,452 records/cu/s)|×4.83|×0.89| ||||Full Table Query|200 CU|287s (50,174 records/cu/s)|651s (22,120 records/cu/s)|164s (87,804 records/cu/s)|×2.27|×0.57| ||||Column Pruning|200 CU|88s (16,3634 records/cu/s)|375s (38,399 records/cu/s)|77s (187,010 records/cu/s)|×4.26|×0.88| |||Incremental Query|Point Query|1 CU|32.19s (30,664 records/cu/s)|641.49s (22,447 records/cu/s)|11.46s (86,200 records/cu/s)|×19.93|×0.36| ||||Range Query|200 CU|73.47s (167,653 records/cu/s)|469.69s (30,658 records/cu/s)|52.44s (234,887 records/cu/s)|×6.39|×0.71| ||||Full Table Query|200 CU|241.50s (59,627 records/cu/s)|669.83s (21,498 records/cu/s)|154.76s (93,044 records/cu/s)|×2.77|×0.64| ||||Column Pruning|200 CU|89.62s (160,677 records/cu/s)|555.87s (25,905 records/cu/s)|57.74s (249,406 records/cu/s)|×6.20|×0.64| ||Flink|Incremental Query|Range Query|200 CU|14min (342,852 records/cu/s)|91min (52,746 records/cu/s)|11min (436,357 records/cu/s)|×6.5|×0.78| ||||Full Table Query|200 CU|16.5min (290,905 records/cu/s)|95min (50,525 records/cu/s)|14.5min (331,029 records/cu/s)|×5.8|×0.87| ||||Column Pruning|200 CU|10min (479,993 records/cu/s)|89min (53,931 records/cu/s)|6.7min (716,408 records/cu/s)|×8.9|×0.67| # III. Test Plan ## 1. Basic Configuration - **Computing Engine**: Spark 3.4 for batch tasks and Flink 1.16 for stream tasks. - **Test Configuration**: LSM, Paimon, and Avro are set as the single variable for each test task, while other configurations remain consistent. For task parameters/configurations, see the appendix. - **Noise Elimination**: Each test case is executed 3 times. After eliminating abnormal results, the average execution time of each test task is used as the performance indicator. - **Test Environment**: Independent operating environments are built to eliminate interference caused by other environmental variables in the benchmark test. |Environment Type|Single Task Specification|Minimum Number of Tasks|Recommended Number of Tasks|Total Resources| |---|---|---|---|---| |Flink Stream Task Environment|124 Cores | 496 GB|3 × 124 Cores | 496 GB|5 × 124 Cores | 496 GB|620 Cores | 2480 GB| |Spark Batch Task Environment|204 Cores | 816 GB|3 × 204 Cores | 816 GB|5 × 204 Cores | 816 GB|1020 Cores | 4080 GB| ## 2. Resources/Machines - **Flink (Single Task)**:Nexmark: 20 × (1c 4G), default parallelism 10, independent slot grouping configured for write operators. - **Spark (Batch Task)**: Driver: 1 × 4c 16g; Executor: 50 × 2c 8g. ## 3. Test Datasets - **General Datasets**: Stream: Nexmark (1 billion data records) - **Batch**: TPCDS (1.44 billion data records) ## 4. Performance Calculation Formula - Batch Task Execution Time = (Sum of execution times of N single tasks with fixed data volume) / N - Real-time Task Execution Time = (End time of consuming fixed data volume - Start time) × N / N - Throughput (records/sec/CU) = Total processed data volume (records) ÷ Actual execution time (sec) ÷ Total consumed CU -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
