felipecrv commented on issue #44183: URL: https://github.com/apache/arrow/issues/44183#issuecomment-2369638060
> Why wouldn't it bring great compression rates if the developer knows the column is mostly constant values? All it takes is a new random or misaligned column (struct field) to mess up the repetitiveness of the data. If you *know* the data is mostly constant values, you don't need `run_end_encode`, because you can produce the run-end encoded array directly without comparing the struct values. You can also go for a struct of run-end-encoded fields (not all of them have to be run-end-encoded) and if the whole struct repeats you can share the same `run_ends` array among the fields (no copying needed). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
