danepitkin opened a new issue, #37943: URL: https://github.com/apache/arrow/issues/37943
### Describe the enhancement requested Arrow and Parquet does not have exhaustive integration testing for all possible Parquet data types. For example, it would be useful if there was a single simple sample Parquet file that had only 1 or 2 rows of data, but covered as much of the type feature space as possible. This would also be useful for testing backwards compatibility of versions e.g. to help catch issues like these[1]. The arrow testing data currently lives in a separate repo[2]. We should: * Put together a directory/list/repo of parquet file(s) that can hit the cross section of features/types/encodings to be a good test suite * Create the infrastructure for actually testing against them e.g. Parquet reader tests [1]https://lists.apache.org/thread/4sw2vfmdx60kl2psolwvch8h2297zdkb [2]https://github.com/apache/arrow-testing/tree/47f7b56b25683202c1fd957668e13f2abafc0f12 ### Component(s) Parquet -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
