[ 
https://issues.apache.org/jira/browse/FLINK-24279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17414892#comment-17414892
 ] 

ZHANG ZHIPENG commented on FLINK-24279:
---------------------------------------

Hi Dawid,

Thanks for the feedback. Using broadcast state may not be feasible due to the 
following two reasons: 

(1) we need priority-based data-consuming. Take the code snippet above as an 
example, we cannot consume any element from d1 before we consumed all of 
elements in d2. However, there seems no priority control in broadcast state in 
Streaming mode now.
(2) We want to support broadcast on arbitrary stream operators, e.g., oneInput, 
twoInput, multipleInput. However, with broadcast state users have to use 
BroadcastProcessFunction, which might require us to re-implement some 
operators' behavior with BroadcastProcessFunction.

By the way, we will not change the API of Datastream. A possible solution could 
be adding a utility functon in FlinkML.

> Support withBroadcast in DataStream
> -----------------------------------
>
>                 Key: FLINK-24279
>                 URL: https://issues.apache.org/jira/browse/FLINK-24279
>             Project: Flink
>          Issue Type: New Feature
>          Components: Library / Machine Learning
>            Reporter: ZHANG ZHIPENG
>            Priority: Major
>
> When doing machine learning using DataStream, we found that DataStream lacks 
> withBroadcast() function, which could be useful in machine learning.
>  
> A DataSet-based demo is like:
>  
> DataSet<?> d1 = ...;
> DataSet<?> d2 = ...;
> d1.map(new RichMapFunction <?, ?>() {
>       @Override
>       public Object map(Object aLong) throws Exception {
>            List<?> elements = getRuntimeContext().getBroadcastVariable("d2");
>            ...
>       }
> }).withBroadcastSet(d2, "d2");
>  
> When supporting withBroadcast() in DataStream, there ma
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to