Re: DataExchangeMode.BATCH in iterations

Fabian Hueske Mon, 01 Feb 2016 03:22:28 -0800

Hi Fridtjof,

the range partitioner works by building a histogram for the partitioning
key. This requires a pass over the whole intermediate data set which means
it needs to be materialized and cannot be processed in a pipelined fashion.
However, pipelined data exchange strategies are a requirement for the data
flows which are executed for iteration bodies.


This is nothing that can be easily fixed at the moment. Touching this part
of the runtime code would have major implications.
I afraid, but I believe we have to accept this restriction.

Best, Fabian


2016-02-01 11:47 GMT+01:00 Fridtjof Sander <fsan...@mailbox.tu-berlin.de>:

> Dear Flink-Devs,
>
> I recently ran into a problem where range-partitioning within iterations
> would be useful.
>
> In the PR for range-partitioning it is said, this doesn't work because of
> some batched data-exchange mode.
> https://github.com/apache/flink/pull/1255
>
> I would like to understand the issue with that, but could not find
> articles/blog posts/etc to read about that.
>
> Do you have some pointers for me? Code will also work if the concept gets
> clear from it.
>
> Thanks for your time!
>
> Best, Fridtjof
>

Re: DataExchangeMode.BATCH in iterations

Reply via email to