alamb commented on issue #7870: URL: https://github.com/apache/arrow-rs/issues/7870#issuecomment-3039009809
> That's tricky, because we don't know, at the time we start a new variant, whether it will add new field names and/or use all existing ones. From a parquet dictionary encoding perspective, it's attractive to use one/few byte-identical metadata dictionaries for the whole column. But getting it wrong could bloat things really badly. I agree the writer will have to be clever and there is room for substantial tradeoffs (like slower write / better encoding / more memory usage). What I suggest is that we have all the low level APIs needed (aka allow the caller to decide if they want to create new metadata, or if they want to keep using the same one even though it might be bloated, etc) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org