tustvold commented on issue #4317:
URL: https://github.com/apache/arrow-rs/issues/4317#issuecomment-1569910785

   The parquet format defines the key value metadata as strings - 
https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L674.
 Which according to the thrift specification are UTF-8 - 
https://thrift.apache.org/docs/types. It would therefore be ill-formed for us 
to write non-UTF-8 data here...
   
   One option might be to support writing arbitrary data before the footer, and 
then encode just this file offset in the metadata. This is similar to how bloom 
filters, indices, etc... are stored. Would this be workable, it would mean 
additional IO on your end to actually fetch this data when needed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to