[
https://issues.apache.org/jira/browse/FLINK-1927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14518974#comment-14518974
]
ASF GitHub Bot commented on FLINK-1927:
---------------------------------------
Github user mxm commented on the pull request:
https://github.com/apache/flink/pull/638#issuecomment-97353102
Wow! Great to see that we get rid of the only python-side dependency. It
was a bit unclear under which terms we could ship the library anyways. Have you
done any measurements how this effects the performance? IMO the performance
impact should be near zero, perhaps even faster now.
> [Py] Rework operator distribution
> ---------------------------------
>
> Key: FLINK-1927
> URL: https://issues.apache.org/jira/browse/FLINK-1927
> Project: Flink
> Issue Type: Improvement
> Components: Python API
> Affects Versions: 0.9
> Reporter: Chesnay Schepler
> Assignee: Chesnay Schepler
> Priority: Minor
> Fix For: 0.9
>
>
> Currently, the python operator is created when execution the python plan
> file, serialized using dill and saved as a byte[] in the java function. It is
> then deserialized at runtime on each node.
> The current implementation is fairly hacky, and imposes certain limitations
> that make it hard to work with. Chaining, or generally saving other
> user-code, always requires a separate deserialization step after
> deserializing the operator.
> These issues can be easily circumvented by rebuilding the (python) plan on
> each node, instead of serializing the operator. The plan creation is
> deterministic, and every operator is uniquely identified by an ID that is
> already known to the java function.
> This change will allow us to easily support custom serializers.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)