[
https://issues.apache.org/jira/browse/BEAM-12565?focusedWorklogId=693375&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-693375
]
ASF GitHub Bot logged work on BEAM-12565:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 09/Dec/21 16:19
Start Date: 09/Dec/21 16:19
Worklog Time Spent: 10m
Work Description: TheNeuralBit commented on a change in pull request
#16027:
URL: https://github.com/apache/beam/pull/16027#discussion_r765931712
##########
File path: sdks/python/apache_beam/dataframe/frames.py
##########
@@ -3441,6 +3460,35 @@ def value_counts(self, subset=None, sort=False,
normalize=False,
else:
return result
+ @frame_base.with_docs_from(pd.DataFrame)
+ @frame_base.args_to_kwargs(pd.DataFrame)
+ @frame_base.populate_defaults(pd.DataFrame)
+ def compare(self, other, **kwargs):
+ align_axis = kwargs.get('align_axis', 1)
+ keep_shape = kwargs.get('keep_shape', False)
Review comment:
if you put `align_axis` and `keep_shape` in the argument list, then
`@frame_base.populate_defaults` should pull the default values from the pandas
code, and you won't need to specify them here. Did you try that?
##########
File path: sdks/python/apache_beam/dataframe/frames.py
##########
@@ -3441,6 +3460,18 @@ def value_counts(self, subset=None, sort=False,
normalize=False,
else:
return result
+ @frame_base.with_docs_from(pd.DataFrame)
+ @frame_base.args_to_kwargs(pd.DataFrame)
+ @frame_base.populate_defaults(pd.DataFrame)
+ def compare(self, other, **kwargs):
Review comment:
Great, thanks!
One more thing: any time there's a caveat like the wontimplementerror here,
we should add a docstring for it, we have infrastructure that puts this in a
"Differences from pandaas" section. See
[here](https://beam.apache.org/releases/pydoc/2.33.0/apache_beam.dataframe.frames.html#module-apache_beam.dataframe.frames).
Something like:
```suggestion
def compare(self, other, **kwargs):
"""The default values ``align_axis=1 and ``keep_shape=False`` are not
supported, because the output columns depend on the data. To use
``align_axis=1``, please specify ``keep_shape=True``."""
```
##########
File path: sdks/python/apache_beam/dataframe/frames.py
##########
@@ -3441,6 +3460,35 @@ def value_counts(self, subset=None, sort=False,
normalize=False,
else:
return result
+ @frame_base.with_docs_from(pd.DataFrame)
+ @frame_base.args_to_kwargs(pd.DataFrame)
+ @frame_base.populate_defaults(pd.DataFrame)
+ def compare(self, other, **kwargs):
+ align_axis = kwargs.get('align_axis', 1)
+ keep_shape = kwargs.get('keep_shape', False)
+
+ preserve_partition = None
+
+ if align_axis and not keep_shape:
+ raise frame_base.WontImplementError(
+ "compare(align_axis=1, keep_shape=False) is not allowed",
+ reason='non-deferred-columns'
Review comment:
A couple nits on the exception:
```suggestion
f"compare(align_axis={align_axis!r}, keep_shape={keep_shape!r}) is
not allowed because the output columns depend on the data, please specify
keep_shape=True.",
reason='non-deferred-columns'
```
- We can use a format string to display the actual user-specified args
- Added a little detail about why this happened, and how to address it
(this may not work as written, it probably needs some linting)
##########
File path: sdks/python/apache_beam/dataframe/pandas_doctests_test.py
##########
@@ -492,7 +493,7 @@ def test_series_tests(self):
's1.append(s2, verify_integrity=True)',
],
# Throws NotImplementedError when modifying df
- 'pandas.core.series.Series.compare': ['*'],
+ # 'pandas.core.series.Series.compare': ['*'],
Review comment:
nit:
```suggestion
```
##########
File path: sdks/python/apache_beam/dataframe/frames.py
##########
@@ -3441,6 +3460,35 @@ def value_counts(self, subset=None, sort=False,
normalize=False,
else:
return result
+ @frame_base.with_docs_from(pd.DataFrame)
+ @frame_base.args_to_kwargs(pd.DataFrame)
+ @frame_base.populate_defaults(pd.DataFrame)
+ def compare(self, other, **kwargs):
+ align_axis = kwargs.get('align_axis', 1)
+ keep_shape = kwargs.get('keep_shape', False)
+
+ preserve_partition = None
+
+ if align_axis and not keep_shape:
Review comment:
`align_axis` is allowed to be 'columns' or 'index'
```suggestion
if align_axis in (1, 'columns') and not keep_shape:
```
##########
File path: sdks/python/apache_beam/dataframe/frames.py
##########
@@ -3441,6 +3460,35 @@ def value_counts(self, subset=None, sort=False,
normalize=False,
else:
return result
+ @frame_base.with_docs_from(pd.DataFrame)
+ @frame_base.args_to_kwargs(pd.DataFrame)
+ @frame_base.populate_defaults(pd.DataFrame)
+ def compare(self, other, **kwargs):
+ align_axis = kwargs.get('align_axis', 1)
+ keep_shape = kwargs.get('keep_shape', False)
+
+ preserve_partition = None
+
+ if align_axis and not keep_shape:
+ raise frame_base.WontImplementError(
+ "compare(align_axis=1, keep_shape=False) is not allowed",
+ reason='non-deferred-columns'
+ )
+
+ if align_axis:
Review comment:
```suggestion
if align_axis in (1, 'columns'):
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 693375)
Time Spent: 3h (was: 2h 50m)
> Implement compare for DataFrame and Series
> ------------------------------------------
>
> Key: BEAM-12565
> URL: https://issues.apache.org/jira/browse/BEAM-12565
> Project: Beam
> Issue Type: Improvement
> Components: dsl-dataframe
> Reporter: Brian Hulette
> Assignee: Mike Hernandez
> Priority: P3
> Time Spent: 3h
> Remaining Estimate: 0h
>
> Add an implementation of
> [compare|https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.compare.html]
> for DeferredDataFrame and DeferredSeries. It should be fully unit tested
> with some combination of pandas_doctests_test.py and frames_test.py.
> https://github.com/apache/beam/pull/14274 is an example of a typical PR that
> adds new operations. See
> https://lists.apache.org/thread.html/r8ffe96d756054610dfdb4e849ffc6a741e826d15ba7e5bdeee1b4266%40%3Cdev.beam.apache.org%3E
> for background on the DataFrame API.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)