[
https://issues.apache.org/jira/browse/FLINK-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16055230#comment-16055230
]
Xingcan Cui commented on FLINK-6936:
------------------------------------
For self use, I have added a {{multicast}} method to {{DataStream}}, as well as
a {{MultiPartitioner}} class, which supports multi partitioning on records
themselves, rather than the keys.
This feature seems to be essential for the stream-stream theta-join I am
working on, since I need to partition and duplicate the source streams manually.
Considering the issue is API sensitive (and [~aljoscha] is on vacation:P),
[~fhueske], could you give me some suggestions? Thanks.
> Add multiple targets support for custom partitioner
> ---------------------------------------------------
>
> Key: FLINK-6936
> URL: https://issues.apache.org/jira/browse/FLINK-6936
> Project: Flink
> Issue Type: Improvement
> Components: DataStream API
> Reporter: Xingcan Cui
> Assignee: Xingcan Cui
> Priority: Minor
>
> The current user-facing Partitioner only allows returning one target.
> {code:java}
> @Public
> public interface Partitioner<K> extends java.io.Serializable, Function {
> /**
> * Computes the partition for the given key.
> *
> * @param key The key.
> * @param numPartitions The number of partitions to partition into.
> * @return The partition index.
> */
> int partition(K key, int numPartitions);
> }
> {code}
> Actually, this function should return multiple partitions and this may be a
> historical legacy.
> There could be at least three approaches to solve this.
> # Make the `protected DataStream<T> setConnectionType(StreamPartitioner<T>
> partitioner)` method in DataStream public and that allows users to directly
> define StreamPartitioner.
> # Change the `partition` method in the Partitioner interface to return an int
> array instead of a single int value.
> # Add a new `multicast` method to DataStream and provide a MultiPartitioner
> interface which returns an int array.
> Considering the consistency of API, the 3rd approach seems to be an
> acceptable choice. [~aljoscha], what do you think?
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)