TheNeuralBit commented on a change in pull request #16706:
URL: https://github.com/apache/beam/pull/16706#discussion_r798110082
##########
File path: sdks/python/apache_beam/dataframe/frames.py
##########
@@ -4017,7 +4076,8 @@ def do_partition_apply(df):
by=grouping_columns or None)
gb = project(gb)
- return gb.apply(func, *args, **kwargs)
+
Review comment:
Deleted!
Generally, anything that passes the PythonLint PreCommit (which runs pylint
and yapf checkers) is fine. That's not as opinionated as some checkers (e.g.
black), so it does leave a decent amount of wiggle room and weird things can
slip in like this whitespace change. It's reasonable to point out anything like
this that looks odd to you.
##########
File path: sdks/python/apache_beam/dataframe/frames.py
##########
@@ -3975,7 +3973,19 @@ def apply(self, func, *args, **kwargs):
object of the same type as what will be returned when the pipeline is
processing actual data. If the result is a pandas object it should have the
same type and name (for a Series) or column types and names (for
- a DataFrame) as the actual results."""
+ a DataFrame) as the actual results.
+
+ Note that in pandas, ``apply`` attempts to detect if the index is
unmodified
+ in ``func`` (indicating ``func`` is a transform) and drops the duplicate
+ index in the output. To determine this, pandas tests the indexes for
+ equality. However, Beam cannot do this since it is sensitive to the input
+ data, instead this implementation tests if the indexes are equivalent
Review comment:
Done!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]