I have some problem with load balancing and was wondering how to deal with
this kind of problem in Flink.
The input I have is a data set of grouped ID's that I join with metadata
for each ID. Then I need to compare each Item in a group with each other
item in that group and if necessary splitting it into different subgroups.
In flink its a join followed by a group reduce.

The problem is that the groups differ a lot in size. 90% of the groups are
done in 5 minutes while the rest takes 2 hours. In order to get this more
efficient I would need to distribute the N to N comparison that currently
is done in the group reduce function. Anyone has an idea how I can do that
in a simple way?

My current Idea is to make the group reduce step emit computation
partitions and then do another flat-map step to do the actual computation.
Would this solve the problem?

cheers Martin

Reply via email to