[jira] [Work logged] (BEAM-6906) Mutating accumulators in fused stages is generally unsafe - need to provide a single mutable accumulator

ASF GitHub Bot (JIRA) Tue, 26 Mar 2019 10:14:46 -0700


     [ 
https://issues.apache.org/jira/browse/BEAM-6906?focusedWorklogId=218769&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218769
 ]


ASF GitHub Bot logged work on BEAM-6906:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 26/Mar/19 17:13
            Start Date: 26/Mar/19 17:13
    Worklog Time Spent: 10m 
      Work Description: robinyqiu commented on issue #8134: [BEAM-6906] Update 
spec on CombineFn and DoFn to clarify mutability of parameters
URL: https://github.com/apache/beam/pull/8134#issuecomment-476753567
 
 
   Thanks! I have addressed your comments and tagged the JIRA issue. I would 
like to separate this PR with a following one that deprecates 
`mergeAccumulators(Iterable<AccumT>)` if we want to do that. And I still have 
some concerns with the deprecation, explained below:
   
   1) Deprecating `abstract` method without sacrificing backwards-compatibility 
seems not easy:
   Taking the Java SDK for example, even if we do the deprecation, users will 
still have to implement the old version of mergeAccumulators and has no option 
to implement only the new one. This is because after the change we will have:
   `@Deprecated public abstract AccumT mergeAccumulators(Iterable<AccumT>)`
   and
   `public AccumT mergeAccumulators(AccumT, Iterable<AccumT>)` (by default 
calls the deprecated one)
   We have to keep the first one abstract and the second one non-abstract in 
order to guarantee backwards-compatibility. In this case, we are forcing users 
to implement a deprecated abstract function, and the new function doesn't help 
to make the interface cleaner/safer at all.
   
   2) Potential performance penalty:
   When we call the deprecated function from the new one, we will have to copy 
accumulators from one iterable to another. But I think this is a minor concern, 
compared to the first one.
   
   One idea: what if we keep the old interface, but make the requirement of 
`mergeAccumulators(Iterable<AccumT>)` even more stringent by saying "non of the 
accumulator is should be mutated"? As far as I see, this way we keep the 
interface clean and safe, and the performance penalty from forcing users to 
create a new empty accumulator inside `mergeAccumulators` is not very high (is 
this true?). The greatest drawback of this approach is that users who were 
implementing this function by mutating accumulators will see their programs 
break without any explicit notice.
   
   Please let me know if you have any idea! I would like to have further 
discussion on this issue.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

            Worklog Id:     (was: 218769)
            Time Spent: 10m
    Remaining Estimate: 0h

> Mutating accumulators in fused stages is generally unsafe - need to provide a 
> single mutable accumulator
> --------------------------------------------------------------------------------------------------------
>
>                 Key: BEAM-6906
>                 URL: https://issues.apache.org/jira/browse/BEAM-6906
>             Project: Beam
>          Issue Type: Bug
>          Components: beam-model
>            Reporter: Kenneth Knowles
>            Assignee: Yueyang Qiu
>            Priority: Critical
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Our current docs encourage a CombineFn author to mutate accumulators for 
> efficiency. This is important, but cannot be done generally without losing 
> efficiency - it is not safe to share accumulators within a stage or across 
> sliding windows. The ownership story needs to be clear. Any accumulator that 
> is mutable is from that point on owned by the CombineFn, not the runner and 
> cannot be given to other steps.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-6906) Mutating accumulators in fused stages is generally unsafe - need to provide a single mutable accumulator

Reply via email to