yeandy commented on a change in pull request #16590:
URL: https://github.com/apache/beam/pull/16590#discussion_r791266952



##########
File path: sdks/python/apache_beam/dataframe/pandas_doctests_test.py
##########
@@ -121,11 +122,12 @@ def test_ndframe_tests(self):
             'pandas.core.generic.NDFrame.convert_dtypes': ['*'],
             'pandas.core.generic.NDFrame.copy': ['*'],
             'pandas.core.generic.NDFrame.droplevel': ['*'],
+            'pandas.core.generic.NDFrame.get': ['*'],
             'pandas.core.generic.NDFrame.rank': [
                 # Modified dataframe
                 'df'
             ],
-            'pandas.core.generic.NDFrame.rename': [
+            'pandas.core.generic.NDFrame._rename': [

Review comment:
       The `rename` 
[function](https://github.com/pandas-dev/pandas/blame/ea2b0fdc64d2a7d28b5e622d9617d7236374fbbe/pandas/core/frame.py#L5092)
 got renamed (no pun intended 😄 )to `_rename`

##########
File path: sdks/python/apache_beam/dataframe/frames.py
##########
@@ -638,10 +638,13 @@ def replace(self, to_replace, value, limit, method, 
**kwargs):
     order-sensitive. It cannot be specified.
 
     If ``limit`` is specified this operation is not parallelizable."""
+    from pandas._libs import lib
     if method is not None and not isinstance(to_replace,
-                                             dict) and value is None:
+                                             dict) and value is lib.no_default:

Review comment:
       Looks like after reading the 
[documentation](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.replace.html),
 if `None` is explicitly passed, we don't do the order-sensitive padding. 
Please confirm my understanding

##########
File path: sdks/python/apache_beam/dataframe/frames.py
##########
@@ -4120,6 +4126,22 @@ def dtypes(self):
     grouping_columns = self._grouping_columns
     return self.apply(lambda df: df.drop(grouping_columns, axis=1).dtypes)
 
+  @frame_base.with_docs_from(DataFrameGroupBy)
+  def value_counts(self, subset=None, sort=False, normalize=False,
+                    ascending=False, dropna=True):
+    return frame_base.DeferredFrame.wrap(
+        expressions.ComputedExpression(
+            'value_counts',
+            lambda df: df.value_counts(
+              subset=subset,
+              sort=sort,
+              normalize=normalize,
+              ascending=ascending,
+              dropna=True), [self._expr],
+            preserves_partition_by=partitionings.Arbitrary(),
+            requires_partition_by=partitionings.Arbitrary())

Review comment:
       How should we do the partitioning?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to