Re: Handling skewness and Heterogeniety

Fabian Hueske Wed, 15 Feb 2017 01:36:58 -0800

Hi Anis,

Flink uses regular hash-partitioning to shuffle records and does not have a
mechanism to counter data skew (other than scaling out).
Heterogeneous hardware can (to some extend) be addressed by adapting the
number of processing slots (or task managers) per machine, i.e., configure
fewer slots on machines with lower performance.


Best, Fabian

2017-02-15 2:12 GMT+01:00 Anis Nasir <[email protected]>:

> Dear All,
>
> I have few use cases for Flink streaming where the cluster consist of
> heterogenous machines.
>
> Additionally, there is skew present in both the input distribution (e.g.,
> each tuple is drawn from a zipf distribution) and the service time (e.g.,
> service time required for each tuple comes from a zipf distribution).
>
> I want to know who Flink will handle such use cases assuming that the
> distribution of both workload and cluster is unknown in prior.
>
> Any help will be highly appreciated!
>
>
> Regards,
> Anis
>

Re: Handling skewness and Heterogeniety

Reply via email to