pitrou commented on issue #38441:
URL: https://github.com/apache/arrow/issues/38441#issuecomment-1780857267

   Well, for example, the only point of BYTE_STREAM_SPLIT is that the encoded 
output compresses better. But if you don't compress, it's exactly the same size 
as the input.
   
   So I'm afraid the strategy should take compression into account _somehow_, 
if some kind of sampling is used.
   
   That said, the strategy can also not use sampling at all, and instead rely 
on simple heuristics (which ones?).
   
   All this probably means is that:
   * we should offer a flexible API for users to implement their own strategy
   * we should also offer a decent default strategy for the majority of users
   
   I can draft an API when I find the time, which would be in a couple weeks 
probably (feel free to ping me :-)).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to