tustvold commented on issue #4317: URL: https://github.com/apache/arrow-rs/issues/4317#issuecomment-1569910785
The parquet format defines the key value metadata as strings - https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L674. Which according to the thrift specification are UTF-8 - https://thrift.apache.org/docs/types. It would therefore be ill-formed for us to write non-UTF-8 data here... One option might be to support writing arbitrary data before the footer, and then encode just this file offset in the metadata. This is similar to how bloom filters, indices, etc... are stored. Would this be workable, it would mean additional IO on your end to actually fetch this data when needed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
