alamb commented on issue #7870:
URL: https://github.com/apache/arrow-rs/issues/7870#issuecomment-3039009809

   > That's tricky, because we don't know, at the time we start a new variant, 
whether it will add new field names and/or use all existing ones. From a 
parquet dictionary encoding perspective, it's attractive to use one/few 
byte-identical metadata dictionaries for the whole column. But getting it wrong 
could bloat things really badly.
   
   I agree the writer will have to be clever and there is room for substantial 
tradeoffs (like slower write / better encoding / more memory usage). 
   
   What I suggest is that we have all the low level APIs needed (aka allow the 
caller to decide if they want to create new metadata, or if they want to keep 
using the same one even though it might be bloated, etc)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to