TheNeuralBit commented on pull request #16590: URL: https://github.com/apache/beam/pull/16590#issuecomment-1029208933
> Just rebased! > > Also added two small commits before merge: > > 1. There was a failing doctest, which I skipped, because a new pandas [change](https://github.com/pandas-dev/pandas/commit/6e06f895d90bf79401515470cead30b352af91be) now allows construction of `DataFrame` with a Series, which fails because it calls the `len()` function, which we don't allow. Sounds good! > 2. I also added to `CHANGES.md` to this PR. Beam 2.36 is still unreleased, but I don't think these changes should add to the 2.36 cut? Let me know if you think this should be in a separate PR. That's right, it won't be in 2.36 since the branch was already cut. I'll be cutting the 2.37 release branch next Wednesday though. It's fine to do it in the same PR. It looks like `apache_beam.dataframe.io_test.IOTest.test_read_write_parquet` is failing in the `py38-pyarrow-0` configuration (where we verify different versions of pyarrow), presumably because pandas 1.4 dropped support for pyarrow 0.17. Could you just skip this test when pandas >= 1.4 and pyarrow < 1.0 are installed? Similar to what we do here: https://github.com/apache/beam/blob/9794fb48ab97fd55930efbb8718b5b4415021b78/sdks/python/apache_beam/dataframe/frames_test.py#L239 We could consider just dropping support for pyarrow < 1.0, but technically the non-dataframe ParquetIO will still work with it. So I think it's better to just skip this test. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
