Hi Sungwoo, It's really great to hear that! To be honest, we never expected such things will happen.
Just curious, is it possible that you contribute the integration with MR3 to Celeborn community? It will be a great feature for Celeborn, also the community can work together to better support MR3 (and Hive). Thanks, Keyong Zhou <[email protected]> 于2023年7月16日周日 22:33写道: > We have extended the implementation of MR3 so that all partition > inputs can be fetched with a single call, e.g.: > > rssShuffleClient.readPartition(..., 0, 100) > > Now, Hive-MR3 with Celeborn runs as fast as Hive-MR3 with its own shuffle > handlers when tested with 10TB TPC-DS benchmark. For some queries, it is > even noticeably faster. > > Thanks, > > --- Sungwoo > > On Thu, 13 Jul 2023, [email protected] wrote: > > > Hi Team, > > > > I have a question on how a reducer should fetch the output of mappers. > > As an example, consider this standard scenario: > > > > 1. There are 100 mapper and 50 reducers. > > 2. Each mapper creates 50 partitions, each of which is to be fetched by > the > > corresponding reducer. > > 3. Each reducer is responsible for a single partition and tries to fetch > 100 > > partitions (one from each mapper). > > > > In our current implementation, a reducer calls > > rssShuffleClient.readPartition() 100 times (one for each mapper): > > > > rssShuffleClient.readPartition(..., mapIndex, mapIndex + 1) > > > > My question is: if reducers start after the completion of all mappers, > can we > > call (or should we try to call) rssShuffleClient.readPartition() only > once, > > as in? > > > > rssShuffleClient.readPartition(..., 0, 100) > > > > My understanding of remote shuffle service (like Magnet for Spark) is > that > > all the partitions destined to the same reducer are automatically merged > by > > the shuffle service, so we thought that just a single call might be > enough. > > > > Thanks, > > > > --- Sungwoo Park > > >
