[I] Parquet read fuzzer test [incubator-gluten]

via GitHub Wed, 17 Apr 2024 00:01:47 -0700


yma11 opened a new issue, #5434:
URL: https://github.com/apache/incubator-gluten/issues/5434


   ### Description
   
   During customer support, we noticed that Velox fails to read some parquet 
files, like the ones with complicated complex types or various parquet schema 
format. There are several issues tracked in community, including but not 
limited to [9242](https://github.com/facebookincubator/velox/issues/9242), 
[9239](https://github.com/facebookincubator/velox/issues/9239), 
[9238](https://github.com/facebookincubator/velox/issues/9238), 
[7776](https://github.com/facebookincubator/velox/issues/7776), etc. So it's 
quite necessary to adopt a fuzzer test against on parquet read to enhance the 
support of this format. We plans to:
   1) port [example 
files](https://github.com/apache/parquet-mr/tree/master/parquet-hadoop/src/test/resources)
 from parquet-mr and verify it in Gluten. This is already done by PR 
[5345](https://github.com/apache/incubator-gluten/pull/5345) and issue 
[9463](https://github.com/facebookincubator/velox/issues/9463) is opened in 
velox upstream.
   2) leverage parquet-mr [data 
generator](https://github.com/apache/parquet-mr/blob/master/parquet-benchmarks/src/main/java/org/apache/parquet/benchmarks/DataGenerator.java)
 to generate parquet files and verify scan result between Spark and Gluten. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] Parquet read fuzzer test [incubator-gluten]

Reply via email to