[
https://issues.apache.org/jira/browse/BEAM-12379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17363243#comment-17363243
]
Brian Hulette commented on BEAM-12379:
--------------------------------------
This corrwith(axis=1)
[test|https://github.com/apache/beam/blob/c0b8e6531f6ade6a9c9e50222542e041954ba911/sdks/python/apache_beam/dataframe/frames_test.py#L636]
fails because the proxy has an index with dtype "object", but the result has
an int64 index.
This looks like an upstream bug in pandas when corrwith(axis=1) is used on
empty dataframes:
{code}
(Pdb) print(df.corrwith(df2, axis=1).index.dtype)
int64
(Pdb) print(df[:0].corrwith(df2[:0], axis=1).index.dtype)
object
{code}
> Some DataFrame operations yield incorrect proxies
> -------------------------------------------------
>
> Key: BEAM-12379
> URL: https://issues.apache.org/jira/browse/BEAM-12379
> Project: Beam
> Issue Type: Bug
> Components: sdk-py-core
> Reporter: Brian Hulette
> Assignee: Brian Hulette
> Priority: P2
> Labels: dataframe-api
> Time Spent: 5h
> Remaining Estimate: 0h
>
> There are some operations that yield proxies which do not match the data they
> produce at runtime. We should add tests that verify proxies match, and fix
> the operations where they dont.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)