Github user aljoscha commented on a diff in the pull request:
https://github.com/apache/flink/pull/747#discussion_r33129956
--- Diff:
flink-staging/flink-streaming/flink-streaming-core/src/main/java/org/apache/flink/streaming/api/datastream/DataStream.java
---
@@ -301,16 +301,68 @@ public ExecutionConfig getExecutionConfig() {
}
/**
- * Groups the elements of a {@link DataStream} by the given key
positions to
- * be used with grouped operators like
- * {@link GroupedDataStream#reduce(ReduceFunction)}</p> This operator
also
- * affects the partitioning of the stream, by forcing values with the
same
- * key to go to the same processing instance.
*
+ * It creates a new {@link KeyedDataStream} that uses the provided key
for partitioning
+ * its operator states. Mind that keyBy does not affect the
partitioning of the {@link DataStream}
--- End diff --
But it seems to affect the partitioning of the stream since the constructor
of KeyedDataStream calls partitionByHash() on the DataStream. (This also
applied to the other keyBy() methods)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---