[jira] [Work logged] (BEAM-9547) Implement all pandas operations (or raise WontImplementError)

ASF GitHub Bot (Jira) Mon, 14 Jun 2021 00:27:08 -0700


     [ 
https://issues.apache.org/jira/browse/BEAM-9547?focusedWorklogId=610188&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-610188
 ]


ASF GitHub Bot logged work on BEAM-9547:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 14/Jun/21 07:26
            Start Date: 14/Jun/21 07:26
    Worklog Time Spent: 10m 
      Work Description: TheNeuralBit commented on a change in pull request 
#14908:
URL: https://github.com/apache/beam/pull/14908#discussion_r650152641



##########
File path: sdks/python/apache_beam/dataframe/frames.py
##########
@@ -1541,6 +1541,19 @@ def repeat(self, repeats, axis):
           "repeat(repeats=) value must be an int or a "
           f"DeferredSeries (encountered {type(repeats)}).")
 
+  @frame_base.with_docs_from(pd.Series)

Review comment:
       Hm good thing you asked for this. When I wrote a test for this I 
realized this is actually order-sensitive. It returns indexes that can be used 
with loc to impose the sorted order, so the result depends on the order of the 
data that is observed by argsort.
   
   I think I had in mind that what was returned was "this element is the Nth 
largest" which would be independent of the input ordering. I think we should 
just make this WontImplement(order-sensitive). The rest of this PR could be 
useful though.

##########
File path: sdks/python/apache_beam/dataframe/frames.py
##########
@@ -1541,6 +1541,19 @@ def repeat(self, repeats, axis):
           "repeat(repeats=) value must be an int or a "
           f"DeferredSeries (encountered {type(repeats)}).")
 
+  @frame_base.with_docs_from(pd.Series)

Review comment:
       Done. Also updated the logic for indexing loc with a DeferredSeries of 
labels to be more general (not just the integer dtype case), and made loc 
available on `DeferredSeries`. Could you take another look?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 610188)
    Time Spent: 117h 40m  (was: 117.5h)

> Implement all pandas operations (or raise WontImplementError)
> -------------------------------------------------------------
>
>                 Key: BEAM-9547
>                 URL: https://issues.apache.org/jira/browse/BEAM-9547
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-py-core
>            Reporter: Brian Hulette
>            Assignee: Robert Bradshaw
>            Priority: P2
>              Labels: dataframe-api
>          Time Spent: 117h 40m
>  Remaining Estimate: 0h
>
> We should have an implementation for every DataFrame, Series, and GroupBy 
> method. Everything that's not possible to implement should get a default 
> implementation that raises WontImplementError
> See https://github.com/apache/beam/pull/10757#discussion_r389132292
> Progress at the individual operation level is tracked in a 
> [spreadsheet|https://docs.google.com/spreadsheets/d/1hHAaJ0n0k2tw465ORs5tfdy4Lg0DnGWIQ53cLjAhel0/edit],
>  consider requesting edit access if you'd like to help out.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9547) Implement all pandas operations (or raise WontImplementError)

Reply via email to