TheNeuralBit commented on a change in pull request #16027:
URL: https://github.com/apache/beam/pull/16027#discussion_r765931712



##########
File path: sdks/python/apache_beam/dataframe/frames.py
##########
@@ -3441,6 +3460,35 @@ def value_counts(self, subset=None, sort=False, 
normalize=False,
       else:
         return result
 
+  @frame_base.with_docs_from(pd.DataFrame)
+  @frame_base.args_to_kwargs(pd.DataFrame)
+  @frame_base.populate_defaults(pd.DataFrame)
+  def compare(self, other, **kwargs):
+    align_axis = kwargs.get('align_axis', 1)
+    keep_shape = kwargs.get('keep_shape', False)

Review comment:
       if you put `align_axis` and `keep_shape` in the argument list, then 
`@frame_base.populate_defaults` should pull the default values from the pandas 
code, and you won't need to specify them here. Did you try that?

##########
File path: sdks/python/apache_beam/dataframe/frames.py
##########
@@ -3441,6 +3460,18 @@ def value_counts(self, subset=None, sort=False, 
normalize=False,
       else:
         return result
 
+  @frame_base.with_docs_from(pd.DataFrame)
+  @frame_base.args_to_kwargs(pd.DataFrame)
+  @frame_base.populate_defaults(pd.DataFrame)
+  def compare(self, other, **kwargs):

Review comment:
       Great, thanks!
   
   One more thing: any time there's a caveat like the wontimplementerror here, 
we should add a docstring for it, we have infrastructure that puts this in a 
"Differences from pandaas" section. See 
[here](https://beam.apache.org/releases/pydoc/2.33.0/apache_beam.dataframe.frames.html#module-apache_beam.dataframe.frames).
   
   Something like:
   ```suggestion
     def compare(self, other, **kwargs):
       """The default values ``align_axis=1 and ``keep_shape=False`` are not 
supported,  because the output columns depend on the data. To use 
``align_axis=1``, please specify ``keep_shape=True``."""
   ```

##########
File path: sdks/python/apache_beam/dataframe/frames.py
##########
@@ -3441,6 +3460,35 @@ def value_counts(self, subset=None, sort=False, 
normalize=False,
       else:
         return result
 
+  @frame_base.with_docs_from(pd.DataFrame)
+  @frame_base.args_to_kwargs(pd.DataFrame)
+  @frame_base.populate_defaults(pd.DataFrame)
+  def compare(self, other, **kwargs):
+    align_axis = kwargs.get('align_axis', 1)
+    keep_shape = kwargs.get('keep_shape', False)
+
+    preserve_partition = None
+
+    if align_axis and not keep_shape:
+      raise frame_base.WontImplementError(
+        "compare(align_axis=1, keep_shape=False) is not allowed",
+        reason='non-deferred-columns'

Review comment:
       A couple nits on the exception:
   
   ```suggestion
           f"compare(align_axis={align_axis!r}, keep_shape={keep_shape!r}) is 
not allowed because the output columns depend on the data, please specify 
keep_shape=True.",
           reason='non-deferred-columns'
   ```
   
   - We can use a format string to display the actual user-specified args
   - Added a little detail about why this happened, and how to address it
   
   (this may not work as written, it probably needs some linting)

##########
File path: sdks/python/apache_beam/dataframe/pandas_doctests_test.py
##########
@@ -492,7 +493,7 @@ def test_series_tests(self):
                 's1.append(s2, verify_integrity=True)',
             ],
             # Throws NotImplementedError when modifying df
-            'pandas.core.series.Series.compare': ['*'],
+            # 'pandas.core.series.Series.compare': ['*'],

Review comment:
       nit:
   ```suggestion
   ```

##########
File path: sdks/python/apache_beam/dataframe/frames.py
##########
@@ -3441,6 +3460,35 @@ def value_counts(self, subset=None, sort=False, 
normalize=False,
       else:
         return result
 
+  @frame_base.with_docs_from(pd.DataFrame)
+  @frame_base.args_to_kwargs(pd.DataFrame)
+  @frame_base.populate_defaults(pd.DataFrame)
+  def compare(self, other, **kwargs):
+    align_axis = kwargs.get('align_axis', 1)
+    keep_shape = kwargs.get('keep_shape', False)
+
+    preserve_partition = None
+
+    if align_axis and not keep_shape:

Review comment:
       `align_axis` is allowed to be 'columns' or 'index'
   ```suggestion
       if align_axis in (1, 'columns') and not keep_shape:
   ```

##########
File path: sdks/python/apache_beam/dataframe/frames.py
##########
@@ -3441,6 +3460,35 @@ def value_counts(self, subset=None, sort=False, 
normalize=False,
       else:
         return result
 
+  @frame_base.with_docs_from(pd.DataFrame)
+  @frame_base.args_to_kwargs(pd.DataFrame)
+  @frame_base.populate_defaults(pd.DataFrame)
+  def compare(self, other, **kwargs):
+    align_axis = kwargs.get('align_axis', 1)
+    keep_shape = kwargs.get('keep_shape', False)
+
+    preserve_partition = None
+
+    if align_axis and not keep_shape:
+      raise frame_base.WontImplementError(
+        "compare(align_axis=1, keep_shape=False) is not allowed",
+        reason='non-deferred-columns'
+      )
+
+    if align_axis:

Review comment:
       ```suggestion
       if align_axis in (1, 'columns'):
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to