wecharyu opened a new issue, #11283:
URL: https://github.com/apache/incubator-gluten/issues/11283

   ### Description
   
   We found gluten native written parquet file size is usually bigger than 
vanilla spark in ZSTD compress, here is an example:
   - Vanill Spark (521M)
   ```bash
   PathInSchema    TotalCompressedSize     TotalUncompressedSize   
CompressionRatio
   data_type               13848      21019        1.5178365106874638
   feature                 67159774   605809299    9.02041896388752
   hit_model_result        14164      14298        1.009460604349054
   key                     797248     1452886      1.8223764750742555
   key_type                3234       8772         2.712430426716141
   raw_session             458181118  19663153226  42.91567778225204
   session_last_timestamp  263022     755293       2.8715962923253566
   sop_rule_result         5885623    137256827    23.320696381674463
   whitelist_result        782        12316        15.749360613810742
   ```
   - Gluten (688M)
   ```bash
   PathInSchema    TotalCompressedSize     TotalUncompressedSize   
CompressionRatio
   data_type               13171      21608        1.6405739883076456
   feature                 66471981   605795216    9.113542381112428
   hit_model_result        13558      13751        1.0142351379259478
   key                     801008     1453807      1.8149718854243653
   key_type                2494       7818         3.1347233360064153
   raw_session             646847228  19293075031  29.826324046641815
   session_last_timestamp  398040     847022       2.1279821123505176
   sop_rule_result         6178526    137352990    22.23070518761271
   whitelist_result        370        12058        32.58918918918919
   ```
   
   ### Gluten version
   
   main branch


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to