[
https://issues.apache.org/jira/browse/HUDI-7578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ethan Guo updated HUDI-7578:
----------------------------
Fix Version/s: 0.15.0
1.0.0
> Avoid unnecessary rewriting when copy old data from old base to new base file
> to improve compaction performance
> ----------------------------------------------------------------------------------------------------------------
>
> Key: HUDI-7578
> URL: https://issues.apache.org/jira/browse/HUDI-7578
> Project: Apache Hudi
> Issue Type: Improvement
> Components: core
> Reporter: Jing Zhang
> Priority: Major
> Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>
> After upgrade a hudi table from 0.10 version to 0.14 version, the compaction
> job become much more slower.
> The hudi table is a MOR table without partition field. And the hudi table
> does not do any schema evolution.
> The compaction job would finished in 52 minutes using 0.14 version. But the
> compaction job would finished in 25 minutes using 0.10 version.
> And in the 0.14 version, the task jstack become much more complex. Including
> the following content:
> !https://private-user-images.githubusercontent.com/1525333/320377766-9394a3b4-3074-4ba5-bd07-7c73f195085f.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTI1NjQ4ODAsIm5iZiI6MTcxMjU2NDU4MCwicGF0aCI6Ii8xNTI1MzMzLzMyMDM3Nzc2Ni05Mzk0YTNiNC0zMDc0LTRiYTUtYmQwNy03YzczZjE5NTA4NWYucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDQwOCUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDA0MDhUMDgyMzAwWiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9ZTk3M2E1NjVkZDYwNTZiNTllZmUwOWEzOTNlMzEwMDA5NDBjYzk1NDE1ZDk4NjQ5ODM0ZjM3N2MwMmFmNzQ3ZSZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.XCgE-sg9BovCyB7USURbPJfcaTB0NKLopRDZJXB-8os!
> After compare 0.14 and 0.10 version, we found there is a difference when copy
> the old record from old base file to new base file.
> In 0.14 version, the cost is much more heavy.
> !https://private-user-images.githubusercontent.com/1525333/320378794-879b0f8e-dbc8-458b-9b45-afdced25580c.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTI1NjQ4ODAsIm5iZiI6MTcxMjU2NDU4MCwicGF0aCI6Ii8xNTI1MzMzLzMyMDM3ODc5NC04NzliMGY4ZS1kYmM4LTQ1OGItOWI0NS1hZmRjZWQyNTU4MGMucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDQwOCUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDA0MDhUMDgyMzAwWiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9NTc0ZjIzNTQ5NGFhMjY4NDBjNGU0MzFmM2MyY2JhZjVjNjM5YmU1Mjk1Njk5MmM1MjA0NDI1M2FiMjAxZjkzYiZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.Vwjh9nyBAXiVEgSTquFVke-8brqi87QkmgI7uYB5ooI!
> !https://private-user-images.githubusercontent.com/1525333/320379033-d22835b2-7d6c-44ae-aaf1-967d1622c9ae.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTI1NjQ4ODAsIm5iZiI6MTcxMjU2NDU4MCwicGF0aCI6Ii8xNTI1MzMzLzMyMDM3OTAzMy1kMjI4MzViMi03ZDZjLTQ0YWUtYWFmMS05NjdkMTYyMmM5YWUucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDQwOCUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDA0MDhUMDgyMzAwWiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9MDBjYzk3OTEyM2I3Njc2OTJiODZlYjI1ZjcxZjA0ZjE0NDdlNzI1YTljYmZkOTA5ODNmNmE5YjVkNDVkMTkwZCZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.5KQiAFWIbAV3HXGRb4K0c5J2R8C_BRvlsjE5dGvkDDA!
>
> !https://private-user-images.githubusercontent.com/1525333/320379285-438984f7-5d3f-4635-ae64-d3221d73cc34.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTI1NjQ4ODAsIm5iZiI6MTcxMjU2NDU4MCwicGF0aCI6Ii8xNTI1MzMzLzMyMDM3OTI4NS00Mzg5ODRmNy01ZDNmLTQ2MzUtYWU2NC1kMzIyMWQ3M2NjMzQucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDQwOCUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDA0MDhUMDgyMzAwWiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9ZDg0M2M1OTBkOTYxNWUwYTU2NzI3MmE3NzhlYTQzY2M3YmFmZTdlZWQ1YWNhMGQzY2FhMjk1ZTQ1ODI1MmQxMCZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.OhoLRMr_PSLPgs9CY3vRcc_kwhXKped41HnsXR35poE!
> !https://private-user-images.githubusercontent.com/1525333/320379415-e1d5ddb4-1544-4f17-b9f9-6193765c8bed.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTI1NjQ4ODAsIm5iZiI6MTcxMjU2NDU4MCwicGF0aCI6Ii8xNTI1MzMzLzMyMDM3OTQxNS1lMWQ1ZGRiNC0xNTQ0LTRmMTctYjlmOS02MTkzNzY1YzhiZWQucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDQwOCUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDA0MDhUMDgyMzAwWiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9YTFiYTA1Mzk3MTg5ZWY5YzlmYjdmZGRiNTc5Yjg4YmQyMzhlNWJhMTkyODlkMTVhYzMzZmY0YjQ1NThkYWY5MyZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.yQt9r2GOaGi5S0QxXbeFC9LnBYzTBmJ-Zg-Eu0CUNHY!
> In 0.10 version, the copy is more simple.
> !https://private-user-images.githubusercontent.com/1525333/320379946-28eb2af7-e0f2-43b7-bfc7-f174e30cd944.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTI1NjQ4ODAsIm5iZiI6MTcxMjU2NDU4MCwicGF0aCI6Ii8xNTI1MzMzLzMyMDM3OTk0Ni0yOGViMmFmNy1lMGYyLTQzYjctYmZjNy1mMTc0ZTMwY2Q5NDQucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDQwOCUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDA0MDhUMDgyMzAwWiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9ZDg5NzQ5NDEzNGMxNjdjZWFlZWRhZDU3OGZhOGMxYWUzMzMyZjQwNTU4NDM4M2NhOTg2Yjc5M2M4NjMzZmZiMSZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.vApjCzWRF_81VLpwo7UDhcCTCEkw3kMfPFFb8JOgqhU!
>
> Rewriting all fields value of each old record is not necessary, update new
> file path value and metadata fields are enough.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
