[GitHub] [beam] TheNeuralBit commented on a change in pull request #16706: [BEAM-13605] Modify groupby.apply implementation in preparation for pandas 1.4.0

GitBox Wed, 02 Feb 2022 15:54:51 -0800


TheNeuralBit commented on a change in pull request #16706:
URL: https://github.com/apache/beam/pull/16706#discussion_r798110082




##########
File path: sdks/python/apache_beam/dataframe/frames.py
##########
@@ -4017,7 +4076,8 @@ def do_partition_apply(df):
                       by=grouping_columns or None)
 
       gb = project(gb)
-      return gb.apply(func, *args, **kwargs)
+

Review comment:
       Deleted!
   
   Generally, anything that passes the PythonLint PreCommit (which runs pylint 
and yapf checkers) is fine. That's not as opinionated as some checkers (e.g. 
black), so it does leave a decent amount of wiggle room and weird things can 
slip in like this whitespace change. It's reasonable to point out anything like 
this that looks odd to you.

##########
File path: sdks/python/apache_beam/dataframe/frames.py
##########
@@ -3975,7 +3973,19 @@ def apply(self, func, *args, **kwargs):
     object of the same type as what will be returned when the pipeline is
     processing actual data. If the result is a pandas object it should have the
     same type and name (for a Series) or column types and names (for
-    a DataFrame) as the actual results."""
+    a DataFrame) as the actual results.
+
+    Note that in pandas, ``apply`` attempts to detect if the index is 
unmodified
+    in ``func`` (indicating ``func`` is a transform) and drops the duplicate
+    index in the output. To determine this, pandas tests the indexes for
+    equality. However, Beam cannot do this since it is sensitive to the input
+    data, instead this implementation tests if the indexes are equivalent

Review comment:
       Done!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [beam] TheNeuralBit commented on a change in pull request #16706: [BEAM-13605] Modify groupby.apply implementation in preparation for pandas 1.4.0

Reply via email to