hudi-bot opened a new issue, #14930: URL: https://github.com/apache/hudi/issues/14930
## JIRA info - Link: https://issues.apache.org/jira/browse/HUDI-2873 - Type: New Feature - Epic: https://issues.apache.org/jira/browse/HUDI-2100 --- ## Comments 12/Jan/22 17:58;alexey.kudinkin;[~xiaotaotao] can you please elaborate what exactly you're planning to do as part of this task?;;; --- 17/Jan/22 12:18;shibei;[~xiaotaotao] Can you describe your idea in detail?;;; --- 17/Jan/22 13:34;mengtao;[~alexey.kudinkin] [~shibei] 1) support optimize data by sparksql, just like dela lake : OPTIMIZE xx_table ZORDER/HILBERT by col1, col2; 2) introduce a new write operation to rewrite table data directly , At present, The performance of clustering operation is slightly worse than that of direct overwrite ;;; --- 18/Jan/22 03:23;shibei;[~xiaotaotao] Two things need to be clarified: 1) Compaction is closer to the semantics of optimize in the data world. At present, compaction doesn't have the ability to sort data when merging log into base file, but the example given above means the optimize command should trigger data sorting, which implies the optimize command should be implemented based on clustering, correct? 2) Is there any possibility to optimize clustering operation instead of introduce a new write operation? ;;; --- 18/Jan/22 10:26;mengtao;[~shibei] do you have wechat, pls add me 1037817390;;; -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
