[ 
https://issues.apache.org/jira/browse/BEAM-1164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17546237#comment-17546237
 ] 

Kenneth Knowles commented on BEAM-1164:
---------------------------------------

This issue has been migrated to https://github.com/apache/beam/issues/18121

> Allow a DoFn to opt in to mutating it's input
> ---------------------------------------------
>
>                 Key: BEAM-1164
>                 URL: https://issues.apache.org/jira/browse/BEAM-1164
>             Project: Beam
>          Issue Type: New Feature
>          Components: beam-model
>            Reporter: Frances Perry
>            Priority: P3
>
> Runners generally can't tell if a DoFn is mutating inputs, but assuming so by 
> default leads to significant performance implications from unnecessary 
> copying (around sibling fusion, etc). So instead the model prevents mutating 
> inputs, and the Direct Runner validates this behavior. (See: 
> http://beam.incubator.apache.org/contribute/design-principles/#make-efficient-things-easy-rather-than-make-easy-things-efficient)
>  
> However, if users are processing a small number of large records by making 
> incremental changes (for example, genomics use cases), the cost of 
> immutability requirement can be very large. As a workaround, users sometimes 
> do suboptimal things (fusing ParDos by hand) or undefined things when they 
> expect the immutability requirement is unnecessarily strict (adding no-op 
> coders in places they hope the runner won't be materializing things, mutating 
> things anyway when they don't expect sibling fusion to happen, etc).
> We should consider adding a signal (MutatingDoFn?) that users explicitly opt 
> in to to say their code may mutate inputs. The runner can then use this 
> assumption to either prevent optimizations that would break in the face of 
> this or insert additional copies as needed to allow optimizations to preserve 
> semantics.
> See this related user@ discussion:
> https://lists.apache.org/thread.html/f39689f54147117f3fc54c498eff1a20fa73f1be5b5cad5b6f816fd3@%3Cuser.beam.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to