timsaucer opened a new issue, #1023:
URL: https://github.com/apache/datafusion-python/issues/1023

   **Describe the bug**
   
   When running the unit tests with pyarrow-19.0.0 installed the 
`test_write_compressed_parquet` tests fail with an error:
   
   ```
   libc++abi: terminating due to uncaught exception of type 
parquet::ParquetException: Repetition level histogram size mismatch
   ```
   
   **To Reproduce**
   
   Clone repo.
   Initialize submodules.
   Install pyarrow 19. 
   Run unit tests.
   Downgrade to pyarrow 18
   Run unit tests.
   
   **Expected behavior**
   
   This should pass. There has been no substantive changes in `datafusion` that 
caused this to happen.
   
   The point of failure of this unit test is in the line that reads
   
   ```python
   metadata = pq.ParquetFile(tmp_path / file).metadata.to_dict()
   ```
   
   Specifically, the `pq.ParquetFile()` command is what causes the error above.
   
   **Additional context**
   
   I am not sure if this is a problem in datafusion, parquet, or in pyarrow. It 
is unlikely the problem is in `datafusion-python` but this is where it's been 
identified so it would be worth tracking IMO.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to