abellgithub commented on issue #530: URL: https://github.com/apache/parquet-format/issues/530#issuecomment-3446778855
I think that largest problem with the Thrift encoding of metadata is that you can't find anything -- you have to read all the data before locating the thing you want. It seems unfortunate to re-encode all the metadata in another format if you could find a way to provide offsets to things people need to find. It wouldn't need 100% direct access, but enough to allow people to locate things without decoding too much that they don't want. This could be done as some binary blob, which is essentially what you're proposing with flatbuf, but you could do something more simplistic than encoding all (most) of the data with flatbuf. Also, having two sets of metadata in one file can lead to inconsistencies -- which data do you believe if things don't match? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
