[jira] [Commented] (FLINK-3419) Drop partitionByHash from DataStream

Vasia Kalavri (JIRA) Tue, 16 Feb 2016 14:09:06 -0800

    [ 
https://issues.apache.org/jira/browse/FLINK-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15149411#comment-15149411
 ]


Vasia Kalavri commented on FLINK-3419:
--------------------------------------

Hey,
is it really necessary to drop this one? We are using it in gelly streaming and 
I think it's quite useful for certain streaming graph algorithms.
The thing is we want to keep state per partition, not by key. For example, we 
partition by vertexID and thus we can merge the state of several vertices into 
one within the same partition. If we do this with a keyBy, then the state grows 
too big for each vertex (we end up storing duplicates). Can we reconsider 
dropping it or is there any other way we can have the same behavior with keyBy?

> Drop partitionByHash from DataStream
> ------------------------------------
>
>                 Key: FLINK-3419
>                 URL: https://issues.apache.org/jira/browse/FLINK-3419
>             Project: Flink
>          Issue Type: Improvement
>          Components: Streaming
>    Affects Versions: 1.0.0
>            Reporter: Aljoscha Krettek
>            Assignee: Stephan Ewen
>            Priority: Blocker
>
> The behavior is no different from {{keyBy}}, except that you cannot use keyed 
> state and windows if you use {{partitionByHash}} so I suggest to drop it.
> We might also want to think about dropping {{shuffle}} and {{global}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-3419) Drop partitionByHash from DataStream

Reply via email to