yma11 opened a new issue, #5434: URL: https://github.com/apache/incubator-gluten/issues/5434
### Description During customer support, we noticed that Velox fails to read some parquet files, like the ones with complicated complex types or various parquet schema format. There are several issues tracked in community, including but not limited to [9242](https://github.com/facebookincubator/velox/issues/9242), [9239](https://github.com/facebookincubator/velox/issues/9239), [9238](https://github.com/facebookincubator/velox/issues/9238), [7776](https://github.com/facebookincubator/velox/issues/7776), etc. So it's quite necessary to adopt a fuzzer test against on parquet read to enhance the support of this format. We plans to: 1) port [example files](https://github.com/apache/parquet-mr/tree/master/parquet-hadoop/src/test/resources) from parquet-mr and verify it in Gluten. This is already done by PR [5345](https://github.com/apache/incubator-gluten/pull/5345) and issue [9463](https://github.com/facebookincubator/velox/issues/9463) is opened in velox upstream. 2) leverage parquet-mr [data generator](https://github.com/apache/parquet-mr/blob/master/parquet-benchmarks/src/main/java/org/apache/parquet/benchmarks/DataGenerator.java) to generate parquet files and verify scan result between Spark and Gluten. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
