[
https://issues.apache.org/jira/browse/BEAM-13605?focusedWorklogId=714869&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-714869
]
ASF GitHub Bot logged work on BEAM-13605:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 25/Jan/22 16:53
Start Date: 25/Jan/22 16:53
Worklog Time Spent: 10m
Work Description: yeandy commented on a change in pull request #16590:
URL: https://github.com/apache/beam/pull/16590#discussion_r791266952
##########
File path: sdks/python/apache_beam/dataframe/pandas_doctests_test.py
##########
@@ -121,11 +122,12 @@ def test_ndframe_tests(self):
'pandas.core.generic.NDFrame.convert_dtypes': ['*'],
'pandas.core.generic.NDFrame.copy': ['*'],
'pandas.core.generic.NDFrame.droplevel': ['*'],
+ 'pandas.core.generic.NDFrame.get': ['*'],
'pandas.core.generic.NDFrame.rank': [
# Modified dataframe
'df'
],
- 'pandas.core.generic.NDFrame.rename': [
+ 'pandas.core.generic.NDFrame._rename': [
Review comment:
The `rename`
[function](https://github.com/pandas-dev/pandas/blame/ea2b0fdc64d2a7d28b5e622d9617d7236374fbbe/pandas/core/frame.py#L5092)
got renamed (no pun intended 😄 )to `_rename`
##########
File path: sdks/python/apache_beam/dataframe/frames.py
##########
@@ -638,10 +638,13 @@ def replace(self, to_replace, value, limit, method,
**kwargs):
order-sensitive. It cannot be specified.
If ``limit`` is specified this operation is not parallelizable."""
+ from pandas._libs import lib
if method is not None and not isinstance(to_replace,
- dict) and value is None:
+ dict) and value is lib.no_default:
Review comment:
Looks like after reading the
[documentation](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.replace.html),
if `None` is explicitly passed, we don't do the order-sensitive padding.
Please confirm my understanding
##########
File path: sdks/python/apache_beam/dataframe/frames.py
##########
@@ -4120,6 +4126,22 @@ def dtypes(self):
grouping_columns = self._grouping_columns
return self.apply(lambda df: df.drop(grouping_columns, axis=1).dtypes)
+ @frame_base.with_docs_from(DataFrameGroupBy)
+ def value_counts(self, subset=None, sort=False, normalize=False,
+ ascending=False, dropna=True):
+ return frame_base.DeferredFrame.wrap(
+ expressions.ComputedExpression(
+ 'value_counts',
+ lambda df: df.value_counts(
+ subset=subset,
+ sort=sort,
+ normalize=normalize,
+ ascending=ascending,
+ dropna=True), [self._expr],
+ preserves_partition_by=partitionings.Arbitrary(),
+ requires_partition_by=partitionings.Arbitrary())
Review comment:
How should we do the partitioning?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 714869)
Time Spent: 2.5h (was: 2h 20m)
> Support pandas 1.4.0 in the DataFrame API
> -----------------------------------------
>
> Key: BEAM-13605
> URL: https://issues.apache.org/jira/browse/BEAM-13605
> Project: Beam
> Issue Type: Improvement
> Components: dsl-dataframe
> Reporter: Brian Hulette
> Assignee: Andy Ye
> Priority: P2
> Time Spent: 2.5h
> Remaining Estimate: 0h
>
> 1.4.0rc1 is out now, we should verify it works with the DataFrame API, then
> increase the version range to allow it.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)