[ 
https://issues.apache.org/jira/browse/BEAM-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16011483#comment-16011483
 ] 

ASF GitHub Bot commented on BEAM-2301:
--------------------------------------

GitHub user jkff opened a pull request:

    https://github.com/apache/beam/pull/3156

    [BEAM-2301] Splits SplittableParDo into a core-construction part and a 
runners-core part

    SplittableParDo itself goes into core-construction, and expands into a 
slightly different transform.
    
    This change is almost completely simply moving code around.
    
    Before:
    ```
    elements: InputT
    | pair with restriction -> ElementAndRestriction<InputT, RestrictionT>
    | split restriction -> same
    | explode windows -> same
    | assign unique key -> KV<String, ElementAndRestriction<InputT, 
RestrictionT>>
    | GBKIntoKeyedWorkItems -> KeyedWorkItem<String, 
ElementAndRestriction<InputT, RestrictionT>>
    | ProcessElements -> PCollection<OutputT>
    ```
    
    After:
    ```
    elements: InputT
    | ...
    | assign unique key -> KV<String, ElementAndRestriction<InputT, 
RestrictionT>>
    | SplittableProcessKeyed -> PCollection<OutputT>
    ```
    
    Most runners (except Dataflow) will still want to go through KeyedWorkItem. 
That part is encapsulated in `SplittableParDoViaKeyedWorkItems`, which has an 
`OverrideFactory` for `SplittableProcessKeyed` expanding it into the good old 
`GBKIntoKeyedWorkItems` and `ProcessElements`. So runner changes are very minor.
    
    Dataflow, however, can not use runners-core during expansion, so it will 
translate `SplittableProcessKeyed` directly and perform its expansion 
service-side, and will instantiate `ProcessFn` worker-side.
    
    R: @tgroh 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jkff/incubator-beam sdf-expansion

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/beam/pull/3156.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3156
    
----
commit be95bdd679fba755785a8e35a87eb1ec6c440882
Author: Eugene Kirpichov <[email protected]>
Date:   2017-05-15T22:54:03Z

    Splits SplittableParDo into a core-construction part and a KWI-related part

----


> Standard expansion of SDF should be in runners-core-construction
> ----------------------------------------------------------------
>
>                 Key: BEAM-2301
>                 URL: https://issues.apache.org/jira/browse/BEAM-2301
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-java-core
>            Reporter: Eugene Kirpichov
>            Assignee: Eugene Kirpichov
>
> As should standard expansions of everything else.
> Since SplittableParDo (the standard expansion of SDF) uses KeyedWorkItem and 
> other things in runners-core that are not available in 
> runners-core-construction, it needs to be refactored somewhat.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to