xiarixiaoyao commented on PR #8679:
URL: https://github.com/apache/hudi/pull/8679#issuecomment-1551560294

   merge-on-read implementation is friendly for update. but has poor query 
performance due to the following reasons
    1) when query mor table, hudi need perform a merge operation between base 
file and log file which brings additional cpu/memory cost
    2) DataSkipping worked inefficient,  since log is unsorted, min-max index 
is invalid.
    3) second index cannot worked on log file directly, and have a poor 
performance
    
   how about introduce delete vector to hudi just like doris/delta lake 
/hologress.
   here is delta lake desigin
   
https://docs.google.com/document/d/1lv35ZPfioopBbzQ7zT82LOev7qV7x4YNLkMr2-L5E_M/edit#heading=h.dd2cc57oc5wk
   with delta vector: 
      1) eliminated the data merge operation, 
      2) MDT and the min-max/second index can work well. 
   Of course, we also lost our payload capability if we use delete vector


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to