weimingdiit commented on PR #9313: URL: https://github.com/apache/hudi/pull/9313#issuecomment-1656990702
My test report The amount of data, memory, and gc parameters are consistent. before optimization:gc accounts for 33.52% of the overall sampling,and spark write stage use 14min Flame Graph:  sparkUI:  after optimization:gc accounts for 9.5% of the overall sampling,and spark write stage use 9.3min Flame Graph:  sparkUI:  Summarize: GC frequency can be reduced by 24% under large amount of data note: Remarks: We found that the previous code was implemented using Hex.encodeHex. It is not clear why it was modified to use the String.format method? because of "No longer depends on incl commons-codec, commons-io, commons-pool, commons-dbcp, commons-lang, commons-logging, avro-mapred" ? see: https://github.com/apache/hudi/pull/873 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
