[
https://issues.apache.org/jira/browse/BEAM-1164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17546237#comment-17546237
]
Kenneth Knowles commented on BEAM-1164:
---------------------------------------
This issue has been migrated to https://github.com/apache/beam/issues/18121
> Allow a DoFn to opt in to mutating it's input
> ---------------------------------------------
>
> Key: BEAM-1164
> URL: https://issues.apache.org/jira/browse/BEAM-1164
> Project: Beam
> Issue Type: New Feature
> Components: beam-model
> Reporter: Frances Perry
> Priority: P3
>
> Runners generally can't tell if a DoFn is mutating inputs, but assuming so by
> default leads to significant performance implications from unnecessary
> copying (around sibling fusion, etc). So instead the model prevents mutating
> inputs, and the Direct Runner validates this behavior. (See:
> http://beam.incubator.apache.org/contribute/design-principles/#make-efficient-things-easy-rather-than-make-easy-things-efficient)
>
> However, if users are processing a small number of large records by making
> incremental changes (for example, genomics use cases), the cost of
> immutability requirement can be very large. As a workaround, users sometimes
> do suboptimal things (fusing ParDos by hand) or undefined things when they
> expect the immutability requirement is unnecessarily strict (adding no-op
> coders in places they hope the runner won't be materializing things, mutating
> things anyway when they don't expect sibling fusion to happen, etc).
> We should consider adding a signal (MutatingDoFn?) that users explicitly opt
> in to to say their code may mutate inputs. The runner can then use this
> assumption to either prevent optimizations that would break in the face of
> this or insert additional copies as needed to allow optimizations to preserve
> semantics.
> See this related user@ discussion:
> https://lists.apache.org/thread.html/f39689f54147117f3fc54c498eff1a20fa73f1be5b5cad5b6f816fd3@%3Cuser.beam.apache.org%3E
--
This message was sent by Atlassian Jira
(v8.20.7#820007)