danny0405 opened a new issue, #14310:
URL: https://github.com/apache/hudi/issues/14310

   ### Feature Description
   
   **What the feature achieves:**
   - fast streaming ingestion with minimum write amplification
   - more efficent queries and compaction (5x)
   - more efficient for point queries with fast data skipping (10x)
   - the leveled layout supports more flexible compaction strategies, for e.g, 
the minor compaction is more friendly for streaming
   - the sorted columnar files has better compaction ratio than avro(10x)
   - more efficient for integration with popular OLAP engines which mostly also 
have native LSM style storage backend like Starrocks and Doris
   
   **Why this feature is needed:**
   In 1.1, we have made a lot of efforts to improve the perf for streaming 
write and read with Flink, while in analytic scenarios, many queries still 
require better efficienies for shorter e2e response time (SLA) with enough 
CPU/memory resources configured, the current base+delta merging can not really 
extend quite well in this case. While in industry, Starrocks and Doris all have 
LSM-style storage backend to support OLAP queries, even for OLTP, there some 
practices like [MyRocks](https://www.vldb.org/pvldb/vol13/p3217-matsunobu.pdf) 
in meta, take the MyRocks as an example, they reports almost half cost saving 
after the migration from B-tree to LSM.
   
   And in JD, they have some practices in production with out-performing 
numbers too (TODO for the benckmark and real production numbers for LSM with 
Hudi)
   
   ### User Experience
   
   **How users will use this feature:**
   A new option can be specfied to declare the layout type, either the current 
or lsm, no explicit API change for users.
   
   
   ### Hudi RFC Requirements
   
   **RFC PR link:** (if applicable)
   
   Todo
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to