jaychia commented on PR #164: URL: https://github.com/apache/parquet-format/pull/164#issuecomment-1631470875
I've been doing more digging into parquet-format and versioning of the spec has definitely been one of the more confusing pieces. I'm glad that there is an effort to define "core" features, or feature presets. I made a simple tool that just reads Parquet files and produces a boolean checklist of features that are being used in each file: https://github.com/Eventual-Inc/parquet-benchmarking. Running a tool like this on "data in the wild" has been our approach so far for understanding what features are actively being used by the main Parquet producers (Arrow, Spark, Trino, Impala etc). It could be useful for developers of Parquet-producing frameworks to start producing Parquet read/write feature compatibility documentation. As a community we can then start to understand which features have been actively adopted and which ones should be considered more experimental or specific to a given framework! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
