[
https://issues.apache.org/jira/browse/BEAM-13966?focusedWorklogId=752251&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-752251
]
ASF GitHub Bot logged work on BEAM-13966:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 04/Apr/22 13:22
Start Date: 04/Apr/22 13:22
Worklog Time Spent: 10m
Work Description: yeandy commented on code in PR #17043:
URL: https://github.com/apache/beam/pull/17043#discussion_r841731499
##########
sdks/python/apache_beam/dataframe/frames_test.py:
##########
@@ -1295,6 +1295,114 @@ def s_times_shuffled(times, s):
self._run_test(lambda s: s.pipe(s_times, 2), s)
self._run_test(lambda s: s.pipe((s_times_shuffled, 's'), 2), s)
+ def test_pivot_non_categorical(self):
+ df = pd.DataFrame({
+ 'foo': ['one', 'one', 'one', 'two', 'two', 'two'],
+ 'bar': ['A', 'B', 'C', 'A', 'B', 'C'],
+ 'baz': [1, 2, 3, 4, 5, 6],
+ 'zoo': ['x', 'y', 'z', 'q', 'w', 't']
+ })
+ with self.assertRaisesRegex(
+ frame_base.WontImplementError,
+ r"pivot\(\) of non-categorical type is not supported"):
+ self._run_test(
+ lambda df: df.pivot(index='foo', columns='bar', values='baz'), df)
+
+ def test_pivot_pandas_example1(self):
+ # Simple test 1
+ df = pd.DataFrame({
+ 'foo': ['one', 'one', 'one', 'two', 'two', 'two'],
+ 'bar': ['A', 'B', 'C', 'A', 'B', 'C'],
+ 'baz': [1, 2, 3, 4, 5, 6],
+ 'zoo': ['x', 'y', 'z', 'q', 'w', 't']
+ })
+ df['bar'] = df['bar'].astype(
+ pd.CategoricalDtype(categories=['A', 'B', 'C']))
+ self._run_test(
+ lambda df: df.pivot(index='foo', columns='bar', values='baz'), df)
+
+ def test_pivot_pandas_example3(self):
+ # Multiple values
+ df = pd.DataFrame({
+ 'foo': ['one', 'one', 'one', 'two', 'two', 'two'],
+ 'bar': ['A', 'B', 'C', 'A', 'B', 'C'],
+ 'baz': [1, 2, 3, 4, 5, 6],
+ 'zoo': ['x', 'y', 'z', 'q', 'w', 't']
+ })
+ df['bar'] = df['bar'].astype(
+ pd.CategoricalDtype(categories=['A', 'B', 'C']))
+ self._run_test(
+ lambda df: df.pivot(index='foo', columns='bar', values=['baz', 'zoo']),
+ df)
+
+ def test_pivot_pandas_example4(self):
+ # Multiple columns
+ df = pd.DataFrame({
+ "lev1": [1, 1, 1, 2, 2, 2],
+ "lev2": [1, 1, 2, 1, 1, 2],
+ "lev3": [1, 2, 1, 2, 1, 2],
+ "lev4": [1, 2, 3, 4, 5, 6],
+ "values": [0, 1, 2, 3, 4, 5]
+ })
+ df['lev2'] = df['lev2'].astype(pd.CategoricalDtype(categories=[1, 2]))
+ df['lev3'] = df['lev3'].astype(pd.CategoricalDtype(categories=[1, 2]))
+ df['values'] = df['values'].astype('Int64')
+ self._run_test(
+ lambda df: df.pivot(
+ index="lev1", columns=["lev2", "lev3"], values="values"),
+ df)
+
+ @unittest.skipIf(
+ PD_VERSION < (1, 4), "Bug in DF.pivot with MultiIndex for pandas < 1.4")
Review Comment:
Thanks, changed! And no worries, appreciate the attention to user experience!
Issue Time Tracking
-------------------
Worklog Id: (was: 752251)
Time Spent: 5h 10m (was: 5h)
> Implement DataFrame.pivot() for DataFrame API
> ---------------------------------------------
>
> Key: BEAM-13966
> URL: https://issues.apache.org/jira/browse/BEAM-13966
> Project: Beam
> Issue Type: Sub-task
> Components: dsl-dataframe, sdk-py-core
> Reporter: Andy Ye
> Assignee: Andy Ye
> Priority: P3
> Labels: dataframe-api
> Time Spent: 5h 10m
> Remaining Estimate: 0h
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)