scxwhite opened a new pull request #4400:
URL: https://github.com/apache/hudi/pull/4400


   
   Brief change log
   
     - compact improve
   
   I found that when the compact plan is generated, the delta log files under 
each filegroup are arranged in the natural order of instant time. in the 
majority of cases,We can think that the latest data is in the latest delta log 
file, so we sort it from large to small according to the instance time, which 
can largely avoid rewriting the data in the compact process, and then optimize 
the compact time.
   
   In addition, when reading the delta log file, we compare the data in the 
external spillablemap with the delta log data. If oldrecord is selected, there 
is no need to rewrite the data in the external spillablemap. Rewriting data 
will waste a lot of resources when data is spill to disk
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   Committer checklist
   
    - [*] Has a corresponding 
[JIRA](https://issues.apache.org/jira/browse/HUDI-3069) in PR title & commit() 
    
    - [*] Commit message is descriptive of the change
    
    - [ ] CI is green
   
    - [ ] Necessary doc changes done or have another open PR
          
    - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to