timsaucer opened a new issue, #1023: URL: https://github.com/apache/datafusion-python/issues/1023
**Describe the bug** When running the unit tests with pyarrow-19.0.0 installed the `test_write_compressed_parquet` tests fail with an error: ``` libc++abi: terminating due to uncaught exception of type parquet::ParquetException: Repetition level histogram size mismatch ``` **To Reproduce** Clone repo. Initialize submodules. Install pyarrow 19. Run unit tests. Downgrade to pyarrow 18 Run unit tests. **Expected behavior** This should pass. There has been no substantive changes in `datafusion` that caused this to happen. The point of failure of this unit test is in the line that reads ```python metadata = pq.ParquetFile(tmp_path / file).metadata.to_dict() ``` Specifically, the `pq.ParquetFile()` command is what causes the error above. **Additional context** I am not sure if this is a problem in datafusion, parquet, or in pyarrow. It is unlikely the problem is in `datafusion-python` but this is where it's been identified so it would be worth tracking IMO. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
