tustvold commented on issue #1627:
URL: https://github.com/apache/arrow-rs/issues/1627#issuecomment-1113791088

   Some ideas to try:
   
   - Disable dictionary compression for columns that don't have repeated values
   - Use writer version 2, which has better string encoding
   - Represent the id / sequence as an integral type instead of a variable 
length string
   - Try without snappy, as compression may not always yield benefits
   - Maybe try writing the data using something like pyarrow to determine if 
this is something specific to the Rust implementation
   
   Without the data it is hard to say for sure what is going on, but ignoring 
compression parquet will have at least a 4 byte overhead per string, and so in 
the case of lots of small strings...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to