Yeah, the way you described it, maybe not. Because the hellabytes are all coming from one rack. But in reality, wouldn't this be more uniform because of how hadoop/hdfs work (distributed more evenly)?
And if that is true, then for all the switched packets passing through the inter-rack switch, a consultation to the tracker would have preceeded it? Well, I'm just theorizing, and I'm sure we'll see more concrete numbers on the relation between # racks, # nodes, # switches, # trackers and their configurations. I like your idea about racking the trackers though. so it won't need any tracker trackers?!? ;) On Mon, 06 Jun 2011 09:40:12 -0400, John Armstrong <[email protected]> wrote: > On Mon, 06 Jun 2011 09:34:56 -0400, <[email protected]> wrote: >> Yeah, that's a good point. >> >> I wonder though, what the load on the tracker nodes (port et. al) would >> be if a inter-rack fiber switch at 10's of GBS' is getting maxed. >> >> Seems to me that if there is that much traffic being mitigate across >> racks, that the tracker node (or whatever node it is) would overload >> first? > > It could happen, but I don't think it would always. For example, tracker > is on rack A; sees that the best place to put reducer R is on rack B; sees > reducer still needs a few hellabytes from mapper M on rack C; tells M to > send data to R; switches on B and C get throttled, leaving A free to handle > other things. > > In fact, it almost makes me wonder if an ideal setup is not only to have > each of the main control daemons on their own nodes, but to put THOSE nodes > on their own rack and keep all the data elsewhere.
