[
https://issues.apache.org/jira/browse/BEAM-12016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17329483#comment-17329483
]
Brian Hulette commented on BEAM-12016:
--------------------------------------
When we run tests in pandas_doctests_test and frames_test we verify that the
operation preserves partitioning like it says it does,
[here|https://github.com/apache/beam/blob/c4cea56fba8183a8ef73c533f08ba4099ec9958f/sdks/python/apache_beam/dataframe/expressions.py#L96-L104].
So if the tests are passing you should be good!
Note that preserves_partition_by=Singleton() is actually a very weak (the
weakest) guarantee. It means that this operation does not preserve any
partitioning except for Singleton, which is trivial to preserve. Preserving
Singleton just means if all the data is on one machine to begin with, the
output will continue to be on one machine, which has to be true anyway.
Just to double-check you might try changing preserves_partition_by to something
more restrictive to see if the tests fail. E.g. if you change it to
preserves_partition_by=Index(), this is a stronger promise, and it should cause
the check above to make the tests fail.
> Implement add_suffix, add_prefix for DataFrame and Series
> ---------------------------------------------------------
>
> Key: BEAM-12016
> URL: https://issues.apache.org/jira/browse/BEAM-12016
> Project: Beam
> Issue Type: Improvement
> Components: sdk-py-core
> Reporter: Brian Hulette
> Assignee: Rogelio Miguel Hernandez Sandoval
> Priority: P3
> Labels: dataframe-api
>
> Add an implementation for
> [add_suffix|https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.add_suffix.html]
> and add_prefix that works for DeferredDataFrame and DeferredSeries, and is
> fully unit tested with some combination of pandas_doctests_test.py and
> frames_test.py.
> https://github.com/apache/beam/pull/14274 is an example of a typical PR that
> adds new operations.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)