[
https://issues.apache.org/jira/browse/BEAM-9547?focusedWorklogId=499745&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-499745
]
ASF GitHub Bot logged work on BEAM-9547:
----------------------------------------
Author: ASF GitHub Bot
Created on: 13/Oct/20 00:10
Start Date: 13/Oct/20 00:10
Worklog Time Spent: 10m
Work Description: robertwb opened a new pull request #13082:
URL: https://github.com/apache/beam/pull/13082
When running on
https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html
Before:
```
250 total test cases:
0 skipped (0.0%)
4 won't implement (1.6%)
3 order-sensitive (75.0%)
1 Conversion to a non-deferred a numpy array. (25.0%)
26 not implemented (yet) (10.4%)
9 NameError following NotImplementedError (34.6%)
5 'index' is not yet supported (BEAM-9547) (19.2%)
5 GroupBy.agg currently only supports callable arguments (19.2%)
1 [Grouper(level=1, axis=0, sort=False), 'A'] (3.8%)
1 [Grouper(level='second', axis=0, sort=False), 'A'] (3.8%)
1 ['second', 'A'] (3.8%)
1 Traceback (most recent call last):\n File
"/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7/doctest.py",
line 1329, in __run\n compileflags, 1), test.globs)\n File "<doctest
/Users/robertwb/.apache_beam/cache/pandas-1.1.1/doc/source/user_guide/groupby.rst[127]>",
line 1, in <module>\n grouped = data_df.groupby(key)\n File
"/Users/robertwb/Work/beam/incubator-beam/sdks/python/apache_beam/dataframe/frames.py",
line 441, in groupby\n [self.set_index(by)._expr],\n File
"/Users/robertwb/Work/beam/incubator-beam/sdks/python/apache_beam/dataframe/frame_base.py",
line 303, in wrapper\n return func(**kwargs)\n File
"/Users/robertwb/Work/beam/incubator-beam/sdks/python/apache_beam/dataframe/frame_base.py",
line 334, in wrapper\n return func(**kwargs)\n File
"/Users/robertwb/Work/beam/incubator-beam/sdks/python/apache_beam/dataframe/frame_base.py",
line 282, in wrapper\n result = func(self, **kwargs)\n File
"/Users/robertwb/Work/beam/incubator-beam/sdks/python/apache_beam/dataframe/frames.py",
line 490, in set_index\n raise
NotImplementedError(keys)\nNotImplementedError: ['US' ,,, 'UK']\n (3.8%)
1 [TimeGrouper(key='Date', freq=<MonthEnd>, axis=0, sort=True,
closed='right', label='right', how='mean', convention='e', origin='start_day'),
'Buyer'] (3.8%)
1 [TimeGrouper(key='Date', freq=<6 * MonthEnds>, axis=0, sort=True,
closed='right', label='right', how='mean', convention='e', origin='start_day'),
'Buyer'] (3.8%)
1 [TimeGrouper(level='Date', freq=<6 * MonthEnds>, axis=0, sort=True,
closed='right', label='right', how='mean', convention='e', origin='start_day'),
'Buyer'] (3.8%)
104 failed (41.6%)
116 passed (46.4%)
```
After
```
250 total test cases:
0 skipped (0.0%)
15 won't implement (6.0%)
9 NameError following
apache_beam.dataframe.frame_base.WontImplementError (60.0%)
3 non-deferred (20.0%)
1 order sensitive (6.7%)
1 Conversion to a non-deferred a numpy array. (6.7%)
1 order-sensitive (6.7%)
51 not implemented (yet) (20.4%)
16 NameError following NotImplementedError (31.4%)
14 'get_group' is not yet supported (BEAM-9547) (27.5%)
6 'order sensitive' is not yet supported (BEAM-9547) (11.8%)
5 GroupBy.agg currently only supports callable arguments (9.8%)
3 groupby(as_index=False) (5.9%)
1 [Grouper(level=1, axis=0, sort=False), 'A'] (2.0%)
1 [Grouper(level='second', axis=0, sort=False), 'A'] (2.0%)
1 'rolling' is not yet supported (BEAM-9547) (2.0%)
1 [TimeGrouper(key='Date', freq=<MonthEnd>, axis=0, sort=True,
closed='right', label='right', how='mean', convention='e', origin='start_day'),
'Buyer'] (2.0%)
1 [TimeGrouper(key='Date', freq=<6 * MonthEnds>, axis=0, sort=True,
closed='right', label='right', how='mean', convention='e', origin='start_day'),
'Buyer'] (2.0%)
1 [TimeGrouper(level='Date', freq=<6 * MonthEnds>, axis=0, sort=True,
closed='right', label='right', how='mean', convention='e', origin='start_day'),
'Buyer'] (2.0%)
1 index.year (2.0%)
49 failed (19.6%)
135 passed (54.0%)
```
Most of what remains is agg for multiple aggregations, which will be a
future PR.
------------------------
Thank you for your contribution! Follow this checklist to help us
incorporate your contribution quickly and easily:
- [ ] [**Choose
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA
issue, if applicable. This will automatically link the pull request to the
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
See the [Contributor Guide](https://beam.apache.org/contribute) for more
tips on [how to make review process
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
Post-Commit Tests Status (on master branch)
------------------------------------------------------------------------------------------------
Lang | SDK | Dataflow | Flink | Samza | Spark | Twister2
--- | --- | --- | --- | --- | --- | ---
Go | [](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
| --- | [](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
| --- | [](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
| ---
Java | [](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
| [](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)<br>[](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
| [](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)<br>[](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)<br>[](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)<br>[](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
| [](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
| [](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)<br>[](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)<br>[](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
| [](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/)
Python | [](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)<br>[](https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)<br>[](https://ci-beam.apache.org/job/beam_PostCommit_Python38/lastCompletedBuild/)
| [](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)<br>[](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/)<br>[](https://ci-beam.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
| [](https://ci-beam.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/)<br>[](https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/)
| --- | [](https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Spark/lastCompletedBuild/)
| ---
XLang | [](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Direct/lastCompletedBuild/)
| --- | [](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Flink/lastCompletedBuild/)
| --- | [](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Spark/lastCompletedBuild/)
| ---
Pre-Commit Tests Status (on master branch)
------------------------------------------------------------------------------------------------
--- |Java | Python | Go | Website | Whitespace | Typescript
--- | --- | --- | --- | --- | --- | ---
Non-portable | [](https://ci-beam.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/)
| [](https://ci-beam.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/)<br>[](https://ci-beam.apache.org/job/beam_PreCommit_PythonLint_Cron/lastCompletedBuild/)<br>[](https://ci-beam.apache.org/job/beam_PreCommit_PythonDocker_Cron/lastCompletedBuild/)
<br>[](https://ci-beam.apache.org/job/beam_PreCommit_PythonDocs_Cron/lastCompletedBuild/)
| [](https://ci-beam.apache.org/job/beam_PreCommit_Go_Cron/lastCompletedBuild/)
| [](https://ci-beam.apache.org/job/beam_PreCommit_Website_Cron/lastCompletedBuild/)
| [](https://ci-beam.apache.org/job/beam_PreCommit_Whitespace_Cron/lastCompletedBuild/)
| [](https://ci-beam.apache.org/job/beam_PreCommit_Typescript_Cron/lastCompletedBuild/)
Portable | --- | [](https://ci-beam.apache.org/job/beam_PreCommit_Portable_Python_Cron/lastCompletedBuild/)
| --- | --- | --- | ---
See
[.test-infra/jenkins/README](https://github.com/apache/beam/blob/master/.test-infra/jenkins/README.md)
for trigger phrase, status and link of all Jenkins jobs.
GitHub Actions Tests Status (on master branch)
------------------------------------------------------------------------------------------------
[](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
[](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
[](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more
information about GitHub Actions CI.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 499745)
Time Spent: 22h 40m (was: 22.5h)
> Implement all pandas operations (or raise WontImplementError)
> -------------------------------------------------------------
>
> Key: BEAM-9547
> URL: https://issues.apache.org/jira/browse/BEAM-9547
> Project: Beam
> Issue Type: Improvement
> Components: sdk-py-core
> Reporter: Brian Hulette
> Assignee: Robert Bradshaw
> Priority: P2
> Time Spent: 22h 40m
> Remaining Estimate: 0h
>
> We should have an implementation for every DataFrame, Series, and GroupBy
> method. Everything that's not actually implemented should get a default
> implementation that raises WontImplementError
> SeeĀ https://github.com/apache/beam/pull/10757#discussion_r389132292
> Progress at the individual operation level is tracked in a
> [spreadsheet|https://docs.google.com/spreadsheets/d/1hHAaJ0n0k2tw465ORs5tfdy4Lg0DnGWIQ53cLjAhel0/edit],
> consider requesting edit access if you'd like to help out.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)