[
https://issues.apache.org/jira/browse/BEAM-13605?focusedWorklogId=719113&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-719113
]
ASF GitHub Bot logged work on BEAM-13605:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 02/Feb/22 00:45
Start Date: 02/Feb/22 00:45
Worklog Time Spent: 10m
Work Description: TheNeuralBit commented on a change in pull request
#16590:
URL: https://github.com/apache/beam/pull/16590#discussion_r797186975
##########
File path: sdks/python/apache_beam/dataframe/frames.py
##########
@@ -4120,6 +4127,23 @@ def dtypes(self):
grouping_columns = self._grouping_columns
return self.apply(lambda df: df.drop(grouping_columns, axis=1).dtypes)
+ if hasattr(DataFrameGroupBy, 'value_counts'):
+ @frame_base.with_docs_from(DataFrameGroupBy)
+ def value_counts(self, subset=None, sort=False, normalize=False,
+ ascending=False, dropna=True):
+ return frame_base.DeferredFrame.wrap(
+ expressions.ComputedExpression(
+ 'value_counts',
+ lambda df: df.value_counts(
+ subset=subset,
+ sort=sort,
+ normalize=normalize,
+ ascending=ascending,
+ dropna=True), [self._expr],
+ preserves_partition_by=partitionings.Arbitrary(),
+ requires_partition_by=partitionings.Arbitrary())
+ )
Review comment:
Thanks for trying this out and for the detailed write up! I think it
makes sense to just stick with your current solution. Let's not bother adding
`self._parent` until we need it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 719113)
Time Spent: 4h 10m (was: 4h)
> Support pandas 1.4.0 in the DataFrame API
> -----------------------------------------
>
> Key: BEAM-13605
> URL: https://issues.apache.org/jira/browse/BEAM-13605
> Project: Beam
> Issue Type: Improvement
> Components: dsl-dataframe
> Reporter: Brian Hulette
> Assignee: Andy Ye
> Priority: P2
> Time Spent: 4h 10m
> Remaining Estimate: 0h
>
> 1.4.0rc1 is out now, we should verify it works with the DataFrame API, then
> increase the version range to allow it.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)