Re: [PR] Parquet: Do not compress v2 data page when compress is bad quality [arrow-rs]

via GitHub Mon, 08 Sep 2025 11:32:07 -0700


mapleFU commented on PR #8257:
URL: https://github.com/apache/arrow-rs/pull/8257#issuecomment-3266991890


   Generally different level of data would have different distribution, and 
like what query-optimizer meets, data changes ( like frequently insertion or 
insert overwrite ) might need to re-sampling the data. So I may think runtime 
config would be different from others
   
   And z-ordering clustering or other cluserting might also changes the 
distribution score. So currently I may think: a user config can set the own 
score, maybe different score for just ingested data (which might need fast 
write) or well clustered data ( which might need well compressed ). 10% is a 
good intuition but it's hard to define it's good. When compressed size > 
uncompressed size it's 100% worse.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Parquet: Do not compress v2 data page when compress is bad quality [arrow-rs]

Reply via email to