Question of fetching mapper output

orpl Thu, 13 Jul 2023 00:46:12 -0700

Hi Team,

I have a question on how a reducer should fetch the output of mappers.
As an example, consider this standard scenario:


1. There are 100 mapper and 50 reducers.

2. Each mapper creates 50 partitions, each of which is to be fetched bythe corresponding reducer.3. Each reducer is responsible for a single partition and tries to fetch100 partitions (one from each mapper).

In our current implementation, a reducer callsrssShuffleClient.readPartition() 100 times (one for each mapper):


  rssShuffleClient.readPartition(..., mapIndex, mapIndex + 1)

My question is: if reducers start after the completion of all mappers, canwe call (or should we try to call) rssShuffleClient.readPartition() onlyonce, as in?


  rssShuffleClient.readPartition(..., 0, 100)

My understanding of remote shuffle service (like Magnet for Spark) is thatall the partitions destined to the same reducer are automatically mergedby the shuffle service, so we thought that just a single call might beenough.


Thanks,

--- Sungwoo Park

Question of fetching mapper output

Reply via email to