[ 
https://issues.apache.org/jira/browse/BEAM-13605?focusedWorklogId=720280&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-720280
 ]

ASF GitHub Bot logged work on BEAM-13605:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 03/Feb/22 17:10
            Start Date: 03/Feb/22 17:10
    Worklog Time Spent: 10m 
      Work Description: TheNeuralBit commented on pull request #16590:
URL: https://github.com/apache/beam/pull/16590#issuecomment-1029208933


   > Just rebased!
   > 
   > Also added two small commits before merge:
   > 
   > 1. There was a failing doctest, which I skipped, because a new pandas 
[change](https://github.com/pandas-dev/pandas/commit/6e06f895d90bf79401515470cead30b352af91be)
 now allows construction of `DataFrame` with a Series, which fails because it 
calls the `len()` function, which we don't allow.
   
   Sounds good!
   
   > 2. I also added to `CHANGES.md` to this PR. Beam 2.36 is still unreleased, 
but I don't think these changes should add to the 2.36 cut? Let me know if you 
think this should be in a separate PR.
   
   That's right, it won't be in 2.36 since the branch was already cut. I'll be 
cutting the 2.37 release branch next Wednesday though. It's fine to do it in 
the same PR.
   
   
   It looks like `apache_beam.dataframe.io_test.IOTest.test_read_write_parquet` 
is failing in the `py38-pyarrow-0` configuration (where we verify different 
versions of pyarrow), presumably because pandas 1.4 dropped support for pyarrow 
0.17. Could you just skip this test when pandas >= 1.4 and pyarrow < 1.0 are 
installed? Similar to what we do here: 
https://github.com/apache/beam/blob/9794fb48ab97fd55930efbb8718b5b4415021b78/sdks/python/apache_beam/dataframe/frames_test.py#L239
   
   We could consider just dropping support for pyarrow < 1.0, but technically 
the non-dataframe ParquetIO will still work with it. So I think it's better to 
just skip this test.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 720280)
    Time Spent: 8h 20m  (was: 8h 10m)

> Support pandas 1.4.0 in the DataFrame API
> -----------------------------------------
>
>                 Key: BEAM-13605
>                 URL: https://issues.apache.org/jira/browse/BEAM-13605
>             Project: Beam
>          Issue Type: Improvement
>          Components: dsl-dataframe
>            Reporter: Brian Hulette
>            Assignee: Andy Ye
>            Priority: P2
>          Time Spent: 8h 20m
>  Remaining Estimate: 0h
>
> 1.4.0rc1 is out now, we should verify it works with the DataFrame API, then 
> increase the version range to allow it.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to