[GitHub] [arrow] jorisvandenbossche commented on issue #37853: [Python][CI] Tests involving fastparquet are never run

via GitHub Mon, 25 Sep 2023 03:49:06 -0700


jorisvandenbossche commented on issue #37853:
URL: https://github.com/apache/arrow/issues/37853#issuecomment-1733422017


   Yes, so it's the different encoding that makes fastparquet fail:
   
   ```
   In [5]: df = pd.DataFrame({"col": [True, False, True]})
   
   In [8]: df.to_parquet("test_bool_pa14_plain.parquet", engine="pyarrow", 
column_encoding={"col": "PLAIN"}, use_dictionary=False)
   
   In [9]: df.to_parquet("test_bool_pa14_rle.parquet", engine="pyarrow", 
column_encoding={"col": "RLE"}, use_dictionary=False)
   
   In [10]: pd.read_parquet("test_bool_pa14_plain.parquet", 
engine="fastparquet")
   Out[10]: 
        col
   0   True
   1  False
   2   True
   
   In [11]: pd.read_parquet("test_bool_pa14_rle.parquet", engine="fastparquet")
   Out[11]: 
        col
   0  False
   1  False
   2  False
   ```
   
   This is getting a bit off-topic for this issue, but maybe that's a good 
argument to actually _do_ run those tests on our CI, then we would have noticed 
this compat issue earlier.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] jorisvandenbossche commented on issue #37853: [Python][CI] Tests involving fastparquet are never run

Reply via email to