Re: Number of consumers per IntermediateResult

Luis Alves Sun, 20 Aug 2017 13:40:59 -0700

Hey Ufuk,

Yes, I've seen that page (and the "// NOTE: currently we support only one 
consumer per result!!!“ comment in the IntermidiateResultPartition :) ).


Thanks,

Luís Alves

On Mon, 14 Aug 2017 at 10:57 Ufuk Celebi

<
mailto:Ufuk Celebi <u...@apache.org>
> wrote:

a, pre, code, a:link, body { word-wrap: break-word !important; }

Hey Luis,

this is correct, yes. Note that these are "only" limitations of the

implementation and there is no fundamental reason to do it like this.

The different characteristics of intermediate results allow us to make

trade-offs here as seen fit.

Furthermore, the type of intermediate result describes when the

consumers can start consuming results. In streaming jobs, all results

are pipelined (consumers consume after the partition has some data

available). In batch jobs, you will find both pipelined and blocking

results (consumers consume only after the partition has all data

available).

Did you also see this Wiki page here?
https://cwiki.apache.org/confluence/display/FLINK/Data+exchange+between+tasks
– Ufuk

On Sun, Aug 13, 2017 at 6:59 PM, Luis Alves <
mailto:lmtjal...@gmail.com
> wrote:

> Hi,

>

> Can someone validate the following regarding the ExecutionGraph:

>

> Each IntermediateResult can only be consumed by a single ExecutionJobVertex, 
> i.e. if two ExecutionJobVertex consume the same tuples (same “stream") that 
> is produced by the same ExecutionJobVertex, then the producer will have two 
> IntermediateResult, one per consumer.

>

> In other words: if an ExecutionJobVertex performs a map operation, and has 
> two consumers (different ExecutionJobVertex), the ExecutionJobVertex will 
> produce two datasets/IntermediateResults (both with the same “content”, but 
> different consumers).

>

> Each ExecutionVertex will then have the same amount of 
> IntermediateResultPartitions as the number of ExecutionJobVertex that consume 
> the datasets generated by the respective ExecutionJobVertex.

>

> Thus, at runtime:

>

> ResultPartition maps to an IntermediateResultPartition (as documented in the 
> javadoc). Thus, 3. is also valid for ResultPartitions.

>

> ResultSubPartition maps to an ExecutionEdge (since it contains the 
> information on how to send the partition to the actual consumer Task).

>

> Thanks,

>

> Luís Alves

Re: Number of consumers per IntermediateResult

Reply via email to