tustvold opened a new issue, #1764: URL: https://github.com/apache/arrow-rs/issues/1764
**Is your feature request related to a problem or challenge? Please describe what you are trying to do.** A significant amount of effort has been put into making the reading of byte arrays from parquet fast: * https://github.com/apache/arrow-rs/pull/1041 * https://github.com/apache/arrow-rs/pull/1082 * https://github.com/apache/arrow-rs/pull/1180 We should invest some effort in making the writer performance comparable. **Describe the solution you'd like** Currently in order to write byte array types from arrow: * Any dictionaries are hydrated * Each value from a string array is separately allocated into a `Vec<ByteArray>` * These values are then written using the ColumnWriter It would be a significant performance win to be able to elide these first two steps. This would likely involve much the same process as was followed for the reader: * Generify ColumnWriter to allow writing from different buffers * Add the ability to write from an arrow ByteArray directly * Add the ability to write from an arrow dictionary array directly **Describe alternatives you've considered** We could not do this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
