[
https://issues.apache.org/jira/browse/FLINK-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16064998#comment-16064998
]
Aljoscha Krettek commented on FLINK-6936:
-----------------------------------------
[~xccui] I agree with you that having an extra {{KeySelector}} when using a
custom {{Partitioner}} seems unnecessary. The only purpose for having a
{{KeySelector}} is when you want to use keyed state, which you cannot use if
you have a custom key selector.
Do you have a design document about how you want to approach the stream-stream
theta join? I would be especially interested in how you want to deal with
fault-tolerance, i.e. how the in-flight data is being buffered while waiting
for events to join with.
> Add multiple targets support for custom partitioner
> ---------------------------------------------------
>
> Key: FLINK-6936
> URL: https://issues.apache.org/jira/browse/FLINK-6936
> Project: Flink
> Issue Type: Improvement
> Components: DataStream API
> Reporter: Xingcan Cui
> Assignee: Xingcan Cui
> Priority: Minor
>
> The current user-facing Partitioner only allows returning one target.
> {code:java}
> @Public
> public interface Partitioner<K> extends java.io.Serializable, Function {
> /**
> * Computes the partition for the given key.
> *
> * @param key The key.
> * @param numPartitions The number of partitions to partition into.
> * @return The partition index.
> */
> int partition(K key, int numPartitions);
> }
> {code}
> Actually, this function should return multiple partitions and this may be a
> historical legacy.
> There could be at least three approaches to solve this.
> # Make the `protected DataStream<T> setConnectionType(StreamPartitioner<T>
> partitioner)` method in DataStream public and that allows users to directly
> define StreamPartitioner.
> # Change the `partition` method in the Partitioner interface to return an int
> array instead of a single int value.
> # Add a new `multicast` method to DataStream and provide a MultiPartitioner
> interface which returns an int array.
> Considering the consistency of API, the 3rd approach seems to be an
> acceptable choice. [~aljoscha], what do you think?
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)