[ 
https://issues.apache.org/jira/browse/BEAM-3736?focusedWorklogId=499429&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-499429
 ]

ASF GitHub Bot logged work on BEAM-3736:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 12/Oct/20 14:26
            Start Date: 12/Oct/20 14:26
    Worklog Time Spent: 10m 
      Work Description: kamilwu commented on a change in pull request #13048:
URL: https://github.com/apache/beam/pull/13048#discussion_r503332709



##########
File path: sdks/python/apache_beam/transforms/core.py
##########
@@ -1970,10 +1985,14 @@ def add_input_types(transform):
       return combined
 
     if self.has_defaults:
-      combine_fn = (
-          self.fn if isinstance(self.fn, CombineFn) else
-          CombineFn.from_callable(self.fn))
-      default_value = combine_fn.apply([], *self.args, **self.kwargs)
+      combine_fn = copy.copy(

Review comment:
       Better protection against potential side effects.
   
   If using default values, `CombineFn.apply` is called at pipeline 
construction time. `CombineFn.setup` and `CombineFn.teardown` are called along 
with it. The same instance of CombineFn is then serialized and sent to runner. 
I think it would be better to perform initial `CombineFn.apply` on a copy, so 
that the state of the instance is not polluted.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 499429)
    Time Spent: 3h 50m  (was: 3h 40m)

> Add SetUp() and TearDown() for CombineFns
> -----------------------------------------
>
>                 Key: BEAM-3736
>                 URL: https://issues.apache.org/jira/browse/BEAM-3736
>             Project: Beam
>          Issue Type: Improvement
>          Components: beam-model, sdk-py-core
>            Reporter: Chuan Yu Foo
>            Assignee: Kamil Wasilewski
>            Priority: P3
>          Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> I have a CombineFn that has a large amount of state that needs to be loaded 
> once before it can add_input or merge_combiners (for example, the CombineFn 
> might load up a large lookup table used for combining). 
> Right now, to initialise this state, for each of the methods, I check if the 
> state has already been initialised, and if not, I initialise it. It would be 
> nice if CombineFn provided a SetUp() method that is called once to initialise 
> this state (and a corresponding TearDown() method to clean up this state if 
> necessary).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to