hudi-bot opened a new issue, #14930:
URL: https://github.com/apache/hudi/issues/14930

   ## JIRA info
   
   - Link: https://issues.apache.org/jira/browse/HUDI-2873
   - Type: New Feature
   - Epic: https://issues.apache.org/jira/browse/HUDI-2100
   
   
   ---
   
   
   ## Comments
   
   12/Jan/22 17:58;alexey.kudinkin;[~xiaotaotao] can you please elaborate what 
exactly you're planning to do as part of this task?;;;
   
   ---
   
   17/Jan/22 12:18;shibei;[~xiaotaotao] Can you describe your idea in detail?;;;
   
   ---
   
   17/Jan/22 13:34;mengtao;[~alexey.kudinkin]  [~shibei] 
   
   1)  support optimize data by sparksql, just like dela lake : OPTIMIZE 
xx_table ZORDER/HILBERT by col1, col2;   
   
   2)  introduce a new write operation to rewrite table data directly , At 
present, The performance of clustering operation is slightly worse than that of 
direct overwrite
   
    
   
    ;;;
   
   ---
   
   18/Jan/22 03:23;shibei;[~xiaotaotao] Two things need to be clarified:
   1) Compaction is closer to the semantics of optimize in the data world. At 
present, compaction doesn't  have the ability to sort data when merging log 
into base file, but the example given above means the optimize command should 
trigger data sorting, which implies the optimize command should be implemented 
based on clustering, correct?
   
   2) Is there any possibility to optimize clustering operation instead of 
introduce a new write operation?
   
    ;;;
   
   ---
   
   18/Jan/22 10:26;mengtao;[~shibei]  do you have wechat,  pls add me 
1037817390;;;


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to