[
https://issues.apache.org/jira/browse/BEAM-13605?focusedWorklogId=720280&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-720280
]
ASF GitHub Bot logged work on BEAM-13605:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 03/Feb/22 17:10
Start Date: 03/Feb/22 17:10
Worklog Time Spent: 10m
Work Description: TheNeuralBit commented on pull request #16590:
URL: https://github.com/apache/beam/pull/16590#issuecomment-1029208933
> Just rebased!
>
> Also added two small commits before merge:
>
> 1. There was a failing doctest, which I skipped, because a new pandas
[change](https://github.com/pandas-dev/pandas/commit/6e06f895d90bf79401515470cead30b352af91be)
now allows construction of `DataFrame` with a Series, which fails because it
calls the `len()` function, which we don't allow.
Sounds good!
> 2. I also added to `CHANGES.md` to this PR. Beam 2.36 is still unreleased,
but I don't think these changes should add to the 2.36 cut? Let me know if you
think this should be in a separate PR.
That's right, it won't be in 2.36 since the branch was already cut. I'll be
cutting the 2.37 release branch next Wednesday though. It's fine to do it in
the same PR.
It looks like `apache_beam.dataframe.io_test.IOTest.test_read_write_parquet`
is failing in the `py38-pyarrow-0` configuration (where we verify different
versions of pyarrow), presumably because pandas 1.4 dropped support for pyarrow
0.17. Could you just skip this test when pandas >= 1.4 and pyarrow < 1.0 are
installed? Similar to what we do here:
https://github.com/apache/beam/blob/9794fb48ab97fd55930efbb8718b5b4415021b78/sdks/python/apache_beam/dataframe/frames_test.py#L239
We could consider just dropping support for pyarrow < 1.0, but technically
the non-dataframe ParquetIO will still work with it. So I think it's better to
just skip this test.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 720280)
Time Spent: 8h 20m (was: 8h 10m)
> Support pandas 1.4.0 in the DataFrame API
> -----------------------------------------
>
> Key: BEAM-13605
> URL: https://issues.apache.org/jira/browse/BEAM-13605
> Project: Beam
> Issue Type: Improvement
> Components: dsl-dataframe
> Reporter: Brian Hulette
> Assignee: Andy Ye
> Priority: P2
> Time Spent: 8h 20m
> Remaining Estimate: 0h
>
> 1.4.0rc1 is out now, we should verify it works with the DataFrame API, then
> increase the version range to allow it.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)