wecharyu opened a new issue, #11283: URL: https://github.com/apache/incubator-gluten/issues/11283
### Description We found gluten native written parquet file size is usually bigger than vanilla spark in ZSTD compress, here is an example: - Vanill Spark (521M) ```bash PathInSchema TotalCompressedSize TotalUncompressedSize CompressionRatio data_type 13848 21019 1.5178365106874638 feature 67159774 605809299 9.02041896388752 hit_model_result 14164 14298 1.009460604349054 key 797248 1452886 1.8223764750742555 key_type 3234 8772 2.712430426716141 raw_session 458181118 19663153226 42.91567778225204 session_last_timestamp 263022 755293 2.8715962923253566 sop_rule_result 5885623 137256827 23.320696381674463 whitelist_result 782 12316 15.749360613810742 ``` - Gluten (688M) ```bash PathInSchema TotalCompressedSize TotalUncompressedSize CompressionRatio data_type 13171 21608 1.6405739883076456 feature 66471981 605795216 9.113542381112428 hit_model_result 13558 13751 1.0142351379259478 key 801008 1453807 1.8149718854243653 key_type 2494 7818 3.1347233360064153 raw_session 646847228 19293075031 29.826324046641815 session_last_timestamp 398040 847022 2.1279821123505176 sop_rule_result 6178526 137352990 22.23070518761271 whitelist_result 370 12058 32.58918918918919 ``` ### Gluten version main branch -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
