[ 
https://issues.apache.org/jira/browse/BEAM-12016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17329483#comment-17329483
 ] 

Brian Hulette commented on BEAM-12016:
--------------------------------------

When we run tests in pandas_doctests_test and frames_test we verify that the 
operation preserves partitioning like it says it does, 
[here|https://github.com/apache/beam/blob/c4cea56fba8183a8ef73c533f08ba4099ec9958f/sdks/python/apache_beam/dataframe/expressions.py#L96-L104].
 So if the tests are passing you should be good!

Note that preserves_partition_by=Singleton() is actually a very weak (the 
weakest) guarantee. It means that this operation does not preserve any 
partitioning except for Singleton, which is trivial to preserve. Preserving 
Singleton just means if all the data is on one machine to begin with, the 
output will continue to be on one machine, which has to be true anyway.

Just to double-check you might try changing preserves_partition_by to something 
more restrictive to see if the tests fail. E.g. if you change it to 
preserves_partition_by=Index(), this is a stronger promise, and it should cause 
the check above to make the tests fail.

> Implement add_suffix, add_prefix for DataFrame and Series
> ---------------------------------------------------------
>
>                 Key: BEAM-12016
>                 URL: https://issues.apache.org/jira/browse/BEAM-12016
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-py-core
>            Reporter: Brian Hulette
>            Assignee: Rogelio Miguel Hernandez Sandoval
>            Priority: P3
>              Labels: dataframe-api
>
> Add an implementation for 
> [add_suffix|https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.add_suffix.html]
>  and add_prefix that works for DeferredDataFrame and DeferredSeries, and is 
> fully unit tested with some combination of pandas_doctests_test.py and 
> frames_test.py. 
> https://github.com/apache/beam/pull/14274 is an example of a typical PR that 
> adds new operations. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to