[ 
https://issues.apache.org/jira/browse/FLINK-1927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14517969#comment-14517969
 ] 

ASF GitHub Bot commented on FLINK-1927:
---------------------------------------

GitHub user zentol opened a pull request:

    https://github.com/apache/flink/pull/638

    [FLINK-1927] [py] Operator distribution rework

    Python operators are no longer serialized and instead rebuilt on each node. 
This also means that the dill library is no longer necessary.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/zentol/flink papipr_operator

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/638.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #638
    
----
commit b587a50b6ca8564a9e246e2d58d1c0cee125fdca
Author: zentol <[email protected]>
Date:   2015-04-19T08:07:38Z

    [FLINK-1927] [py] Operator distribution rework

----


> [Py] Rework operator distribution
> ---------------------------------
>
>                 Key: FLINK-1927
>                 URL: https://issues.apache.org/jira/browse/FLINK-1927
>             Project: Flink
>          Issue Type: Improvement
>          Components: Python API
>    Affects Versions: 0.9
>            Reporter: Chesnay Schepler
>            Assignee: Chesnay Schepler
>            Priority: Minor
>             Fix For: 0.9
>
>
> Currently, the python operator is created when execution the python plan 
> file, serialized using dill and saved as a byte[] in the java function. It is 
> then deserialized at runtime on each node.
> The current implementation is fairly hacky, and imposes certain limitations 
> that make it hard to work with. Chaining, or generally saving other 
> user-code, always requires a separate deserialization step after 
> deserializing the operator.
> These issues can be easily circumvented by rebuilding the (python) plan on 
> each node, instead of serializing the operator. The plan creation is 
> deterministic, and every operator is uniquely identified by an ID that is 
> already known to the java function.
> This change will allow us to easily support custom serializers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to