wgtmac commented on issue #34536:
URL: https://github.com/apache/arrow/issues/34536#issuecomment-1467228060

   This is interesting. We need to make these parameters configurable via 
`ColumnProperties` or something before benchmarking. 
   
   By the way, I really doubt we can reach to a clear answer to this question 
in the end. The best encoding ratio is the entropy of the data which requires a 
precise knowledge of probability and distribution of all input words. As the 
result is highly dependent on the data distribution, we need to define and 
prepare some datasets with distinct distribution and pattern.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to