jorisvandenbossche commented on issue #37853:
URL: https://github.com/apache/arrow/issues/37853#issuecomment-1733422017
Yes, so it's the different encoding that makes fastparquet fail:
```
In [5]: df = pd.DataFrame({"col": [True, False, True]})
In [8]: df.to_parquet("test_bool_pa14_plain.parquet", engine="pyarrow",
column_encoding={"col": "PLAIN"}, use_dictionary=False)
In [9]: df.to_parquet("test_bool_pa14_rle.parquet", engine="pyarrow",
column_encoding={"col": "RLE"}, use_dictionary=False)
In [10]: pd.read_parquet("test_bool_pa14_plain.parquet",
engine="fastparquet")
Out[10]:
col
0 True
1 False
2 True
In [11]: pd.read_parquet("test_bool_pa14_rle.parquet", engine="fastparquet")
Out[11]:
col
0 False
1 False
2 False
```
This is getting a bit off-topic for this issue, but maybe that's a good
argument to actually _do_ run those tests on our CI, then we would have noticed
this compat issue earlier.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]