mapleFU commented on PR #8257: URL: https://github.com/apache/arrow-rs/pull/8257#issuecomment-3266991890
Generally different level of data would have different distribution, and like what query-optimizer meets, data changes ( like frequently insertion or insert overwrite ) might need to re-sampling the data. So I may think runtime config would be different from others And z-ordering clustering or other cluserting might also changes the distribution score. So currently I may think: a user config can set the own score, maybe different score for just ingested data (which might need fast write) or well clustered data ( which might need well compressed ). 10% is a good intuition but it's hard to define it's good. When compressed size > uncompressed size it's 100% worse. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org