jorisvandenbossche commented on issue #37853:
URL: https://github.com/apache/arrow/issues/37853#issuecomment-1733422017

   Yes, so it's the different encoding that makes fastparquet fail:
   
   ```
   In [5]: df = pd.DataFrame({"col": [True, False, True]})
   
   In [8]: df.to_parquet("test_bool_pa14_plain.parquet", engine="pyarrow", 
column_encoding={"col": "PLAIN"}, use_dictionary=False)
   
   In [9]: df.to_parquet("test_bool_pa14_rle.parquet", engine="pyarrow", 
column_encoding={"col": "RLE"}, use_dictionary=False)
   
   In [10]: pd.read_parquet("test_bool_pa14_plain.parquet", 
engine="fastparquet")
   Out[10]: 
        col
   0   True
   1  False
   2   True
   
   In [11]: pd.read_parquet("test_bool_pa14_rle.parquet", engine="fastparquet")
   Out[11]: 
        col
   0  False
   1  False
   2  False
   ```
   
   This is getting a bit off-topic for this issue, but maybe that's a good 
argument to actually _do_ run those tests on our CI, then we would have noticed 
this compat issue earlier.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to