[
https://issues.apache.org/jira/browse/BEAM-6906?focusedWorklogId=218769&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218769
]
ASF GitHub Bot logged work on BEAM-6906:
----------------------------------------
Author: ASF GitHub Bot
Created on: 26/Mar/19 17:13
Start Date: 26/Mar/19 17:13
Worklog Time Spent: 10m
Work Description: robinyqiu commented on issue #8134: [BEAM-6906] Update
spec on CombineFn and DoFn to clarify mutability of parameters
URL: https://github.com/apache/beam/pull/8134#issuecomment-476753567
Thanks! I have addressed your comments and tagged the JIRA issue. I would
like to separate this PR with a following one that deprecates
`mergeAccumulators(Iterable<AccumT>)` if we want to do that. And I still have
some concerns with the deprecation, explained below:
1) Deprecating `abstract` method without sacrificing backwards-compatibility
seems not easy:
Taking the Java SDK for example, even if we do the deprecation, users will
still have to implement the old version of mergeAccumulators and has no option
to implement only the new one. This is because after the change we will have:
`@Deprecated public abstract AccumT mergeAccumulators(Iterable<AccumT>)`
and
`public AccumT mergeAccumulators(AccumT, Iterable<AccumT>)` (by default
calls the deprecated one)
We have to keep the first one abstract and the second one non-abstract in
order to guarantee backwards-compatibility. In this case, we are forcing users
to implement a deprecated abstract function, and the new function doesn't help
to make the interface cleaner/safer at all.
2) Potential performance penalty:
When we call the deprecated function from the new one, we will have to copy
accumulators from one iterable to another. But I think this is a minor concern,
compared to the first one.
One idea: what if we keep the old interface, but make the requirement of
`mergeAccumulators(Iterable<AccumT>)` even more stringent by saying "non of the
accumulator is should be mutated"? As far as I see, this way we keep the
interface clean and safe, and the performance penalty from forcing users to
create a new empty accumulator inside `mergeAccumulators` is not very high (is
this true?). The greatest drawback of this approach is that users who were
implementing this function by mutating accumulators will see their programs
break without any explicit notice.
Please let me know if you have any idea! I would like to have further
discussion on this issue.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 218769)
Time Spent: 10m
Remaining Estimate: 0h
> Mutating accumulators in fused stages is generally unsafe - need to provide a
> single mutable accumulator
> --------------------------------------------------------------------------------------------------------
>
> Key: BEAM-6906
> URL: https://issues.apache.org/jira/browse/BEAM-6906
> Project: Beam
> Issue Type: Bug
> Components: beam-model
> Reporter: Kenneth Knowles
> Assignee: Yueyang Qiu
> Priority: Critical
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Our current docs encourage a CombineFn author to mutate accumulators for
> efficiency. This is important, but cannot be done generally without losing
> efficiency - it is not safe to share accumulators within a stage or across
> sliding windows. The ownership story needs to be clear. Any accumulator that
> is mutable is from that point on owned by the CombineFn, not the runner and
> cannot be given to other steps.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)