TheNeuralBit commented on pull request #16590:
URL: https://github.com/apache/beam/pull/16590#issuecomment-1029208933


   > Just rebased!
   > 
   > Also added two small commits before merge:
   > 
   > 1. There was a failing doctest, which I skipped, because a new pandas 
[change](https://github.com/pandas-dev/pandas/commit/6e06f895d90bf79401515470cead30b352af91be)
 now allows construction of `DataFrame` with a Series, which fails because it 
calls the `len()` function, which we don't allow.
   
   Sounds good!
   
   > 2. I also added to `CHANGES.md` to this PR. Beam 2.36 is still unreleased, 
but I don't think these changes should add to the 2.36 cut? Let me know if you 
think this should be in a separate PR.
   
   That's right, it won't be in 2.36 since the branch was already cut. I'll be 
cutting the 2.37 release branch next Wednesday though. It's fine to do it in 
the same PR.
   
   
   It looks like `apache_beam.dataframe.io_test.IOTest.test_read_write_parquet` 
is failing in the `py38-pyarrow-0` configuration (where we verify different 
versions of pyarrow), presumably because pandas 1.4 dropped support for pyarrow 
0.17. Could you just skip this test when pandas >= 1.4 and pyarrow < 1.0 are 
installed? Similar to what we do here: 
https://github.com/apache/beam/blob/9794fb48ab97fd55930efbb8718b5b4415021b78/sdks/python/apache_beam/dataframe/frames_test.py#L239
   
   We could consider just dropping support for pyarrow < 1.0, but technically 
the non-dataframe ParquetIO will still work with it. So I think it's better to 
just skip this test.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to