[
https://issues.apache.org/jira/browse/BEAM-9547?focusedWorklogId=578690&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-578690
]
ASF GitHub Bot logged work on BEAM-9547:
----------------------------------------
Author: ASF GitHub Bot
Created on: 07/Apr/21 21:28
Start Date: 07/Apr/21 21:28
Worklog Time Spent: 10m
Work Description: TheNeuralBit commented on a change in pull request
#14362:
URL: https://github.com/apache/beam/pull/14362#discussion_r609079270
##########
File path: sdks/python/apache_beam/dataframe/frames.py
##########
@@ -486,11 +486,35 @@ def _get_index(self):
first = last = frame_base.wont_implement_method('order-sensitive')
head = tail = frame_base.wont_implement_method('order-sensitive')
+ interpolate = frame_base.wont_implement_method('order-sensitive')
@populate_not_implemented(pd.Series)
@frame_base.DeferredFrame._register_for(pd.Series)
class DeferredSeries(DeferredDataFrameOrSeries):
+ @property
+ def name(self):
+ return self._expr.proxy().name
Review comment:
The `proxy()` for an expression (`self._expr`) should never change.
In general the `_expr` for a `DeferredFrame` does not change, but it can
happen, for example when performing an inplace operation:
https://github.com/apache/beam/blob/a8cd05932bed9b2480316fb8518409636cb2733b/sdks/python/apache_beam/dataframe/frame_base.py#L328-L337
So if you access `name` before and after an inplace operation it will point
to a different proxy.
##########
File path: sdks/python/apache_beam/dataframe/frames.py
##########
@@ -2164,6 +2186,20 @@ def __init__(self, frame):
def names(self):
return self._frame._expr.proxy().index.names
+ @names.setter
+ def names(self, value):
+ def set_index_names(df):
+ df = df.copy()
+ df.index.names = value
+ return df
Review comment:
This is to avoid mutating the original `DataFrame`, since PCollection
elements are supposed to be immutable.
##########
File path: sdks/python/apache_beam/dataframe/frames.py
##########
@@ -470,12 +470,12 @@ def mask(self, cond, **kwargs):
def dtype(self):
return self._expr.proxy().dtype
+ isin = frame_base._elementwise_method('isin')
+
@property
def ndim(self):
return self._expr.proxy().ndim
- dtypes = dtype
-
Review comment:
Yeah thats a good question, this is misleading from this diff alone.
This is removing `DeferredDataFrameOrSeries.dtypes`, because Series isn't
supposed to have a `dtypes` accessor (only `dtype`), and we already have an
explicit one in `DeferredDataFrame`:
https://github.com/apache/beam/blob/a8cd05932bed9b2480316fb8518409636cb2733b/sdks/python/apache_beam/dataframe/frames.py#L1130-L1132
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 578690)
Time Spent: 68h (was: 67h 50m)
> Implement all pandas operations (or raise WontImplementError)
> -------------------------------------------------------------
>
> Key: BEAM-9547
> URL: https://issues.apache.org/jira/browse/BEAM-9547
> Project: Beam
> Issue Type: Improvement
> Components: sdk-py-core
> Reporter: Brian Hulette
> Assignee: Robert Bradshaw
> Priority: P2
> Labels: dataframe-api
> Time Spent: 68h
> Remaining Estimate: 0h
>
> We should have an implementation for every DataFrame, Series, and GroupBy
> method. Everything that's not possible to implement should get a default
> implementation that raises WontImplementError
> SeeĀ https://github.com/apache/beam/pull/10757#discussion_r389132292
> Progress at the individual operation level is tracked in a
> [spreadsheet|https://docs.google.com/spreadsheets/d/1hHAaJ0n0k2tw465ORs5tfdy4Lg0DnGWIQ53cLjAhel0/edit],
> consider requesting edit access if you'd like to help out.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)