Re: [I] New LSM file layout [hudi]

via GitHub Sun, 21 Dec 2025 18:23:24 -0800


zhangyue19921010 commented on issue #14310:
URL: https://github.com/apache/hudi/issues/14310#issuecomment-3680087964


   Hi @vinothchandar and @danny0405 
   
   The plan is to split it into the following steps:
   1. Conduct another benchmark for the Hudi 1.x version.
   2. Prepare a detailed design document, mainly covering:
       1. The organizational form and naming conventions of Hudi Core and LSM 
files.
       2. Flink/Spark Write: high-performance sorted writing by primary key.
       3. Flink/Spark Read: multi-way merge sort.
       4. Compaction, including multi-level compaction strategies and 
hierarchical merging for compaction.
   3. Start the research and development, first finish the end-to-end link 
based on Flink:
       1. Flink writing
       2. Flink reading
       3. Flink compaction
   4. Develop Spark-related LSM.
   5. Adapt to queries such as Presto and Hive.
   
   There may still be some adaptation work needed to submit a PR directly based 
on the open-source 0.13.1 version to reduce noise (we have also made many 
internal modifications based on 0.13.1, which are not related to LSM). I can 
provide the design ideas in as much detail as possible in the design document.
   
   Is this acceptable?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] New LSM file layout [hudi]

Reply via email to