[
https://issues.apache.org/jira/browse/SPARK-42397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17687794#comment-17687794
]
Ted Chester Jenks commented on SPARK-42397:
-------------------------------------------
Is it ever expected for df.show() and df.collect() to give different results?
That is what struck me as odd in this case and yes those two give different
values.
> Inconsistent data produced by `FlatMapCoGroupsInPandas`
> -------------------------------------------------------
>
> Key: SPARK-42397
> URL: https://issues.apache.org/jira/browse/SPARK-42397
> Project: Spark
> Issue Type: Bug
> Components: Pandas API on Spark, SQL
> Affects Versions: 3.3.0, 3.3.1
> Reporter: Ted Chester Jenks
> Priority: Minor
>
> We are seeing inconsistent data returned when using
> `FlatMapCoGroupsInPandas`. In the PySpark example from the comments, when we
> call `grouped_df.collect()` we get:
>
> {{[Row(left_colms="Index(['cluster', 'event', 'abc'], dtype='object')",
> right_colms="Index(['cluster', 'event', 'def'], dtype='object')")] }}
>
> When we call `grouped_df.show(5, truncate=False)` we get:
>
> {{[Row(left_colms="Index(['cluster', 'abc'], dtype='object')",
> right_colms="Index(['cluster', 'event', 'def'], dtype='object')",
> xyz='1234')] }}
>
> When we call `grouped_df_1.collect()` we get:
>
> {{[Row(left_colms="Index(['cluster', 'abc'], dtype='object')",
> right_colms="Index(['cluster', 'event', 'def'], dtype='object')",
> xyz='1234')] }}
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]