[
https://issues.apache.org/jira/browse/FLINK-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15149411#comment-15149411
]
Vasia Kalavri commented on FLINK-3419:
--------------------------------------
Hey,
is it really necessary to drop this one? We are using it in gelly streaming and
I think it's quite useful for certain streaming graph algorithms.
The thing is we want to keep state per partition, not by key. For example, we
partition by vertexID and thus we can merge the state of several vertices into
one within the same partition. If we do this with a keyBy, then the state grows
too big for each vertex (we end up storing duplicates). Can we reconsider
dropping it or is there any other way we can have the same behavior with keyBy?
> Drop partitionByHash from DataStream
> ------------------------------------
>
> Key: FLINK-3419
> URL: https://issues.apache.org/jira/browse/FLINK-3419
> Project: Flink
> Issue Type: Improvement
> Components: Streaming
> Affects Versions: 1.0.0
> Reporter: Aljoscha Krettek
> Assignee: Stephan Ewen
> Priority: Blocker
>
> The behavior is no different from {{keyBy}}, except that you cannot use keyed
> state and windows if you use {{partitionByHash}} so I suggest to drop it.
> We might also want to think about dropping {{shuffle}} and {{global}}.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)