Re: load based stream partitioning

Gaurav Gupta Thu, 11 Feb 2016 12:06:18 -0800

Pramod,

How would it work with recovery? There could be cases where a tuple went to
P1 and post recovery it can go to P2


Gaurav

On Thu, Feb 11, 2016 at 11:56 AM, Pramod Immaneni <[email protected]>
wrote:

> Hi,
>
> There are scenarios where the downstream partitions of an upstream operator
> are generally not performing uniformly resulting in an overall sub-optimal
> performance dictated by the slowest partitions. The reasons could be data
> related such as some partitions are receiving more data to process than the
> others or could be environment related such as some partitions are running
> slower than others because they are on heavily loaded nodes.
>
> A solution based on currently available functionality in the engine would
> be to write a StreamCodec implementation to distribute data among the
> partitions such that each partition is receiving similar amount of data to
> process. We should consider adding StreamCodecs like these to the library
> but these however do not solve the problem when it is environment related.
>
> For that a better and more comprehensive approach would be look at how data
> is being consumed by the downstream partitions from the BufferServer and
> use that information to make decisions on how to send future data. If some
> partitions are behind others in consuming data then data can be directed to
> the other partitions. One way to do this would be to relay this type of
> statistical and positional information from BufferServer to the upstream
> publishers. The publishers can use this information in ways such as making
> it available to StreamCodecs to affect destination of future data.
>
> What do you think.
>
> Thanks
>

Re: load based stream partitioning

Reply via email to