Hey Luis,

this is correct, yes. Note that these are "only" limitations of the
implementation and there is no fundamental reason to do it like this.
The different characteristics of intermediate results allow us to make
trade-offs here as seen fit.

Furthermore, the type of intermediate result describes when the
consumers can start consuming results. In streaming jobs, all results
are pipelined (consumers consume after the partition has some data
available). In batch jobs, you will find both pipelined and blocking
results (consumers consume only after the partition has all data
available).

Did you also see this Wiki page here?

https://cwiki.apache.org/confluence/display/FLINK/Data+exchange+between+tasks

– Ufuk

On Sun, Aug 13, 2017 at 6:59 PM, Luis Alves <lmtjal...@gmail.com> wrote:
> Hi,
>
> Can someone validate the following regarding the ExecutionGraph:
>
> Each IntermediateResult can only be consumed by a single ExecutionJobVertex, 
> i.e. if two ExecutionJobVertex consume the same tuples (same “stream") that 
> is produced by the same ExecutionJobVertex, then the producer will have two 
> IntermediateResult, one per consumer.
>
> In other words: if an ExecutionJobVertex performs a map operation, and has 
> two consumers (different ExecutionJobVertex), the ExecutionJobVertex will 
> produce two datasets/IntermediateResults (both with the same “content”, but 
> different consumers).
>
> Each ExecutionVertex will then have the same amount of 
> IntermediateResultPartitions as the number of ExecutionJobVertex that consume 
> the datasets generated by the respective ExecutionJobVertex.
>
> Thus, at runtime:
>
> ResultPartition maps to an IntermediateResultPartition (as documented in the 
> javadoc). Thus, 3. is also valid for ResultPartitions.
>
> ResultSubPartition maps to an ExecutionEdge (since it contains the 
> information on how to send the partition to the actual consumer Task).
>
> Thanks,
>
> Luís Alves

Reply via email to