felipecrv commented on issue #44183:
URL: https://github.com/apache/arrow/issues/44183#issuecomment-2369638060

   > Why wouldn't it bring great compression rates if the developer knows the 
column is mostly constant values?
   
   All it takes is a new random or misaligned column (struct field) to mess up 
the repetitiveness of the data.
   
   If you *know* the data is mostly constant values, you don't need 
`run_end_encode`, because you can produce the run-end encoded array directly 
without comparing the struct values.
   
   You can also go for a struct of run-end-encoded fields (not all of them have 
to be run-end-encoded) and if the whole struct repeats you can share the same 
`run_ends` array among the fields (no copying needed).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to