Re: Why inter-rack communication in mapreduce slow?

James Seigel Mon, 06 Jun 2011 18:53:57 -0700

Not sure that will make the interconnect faster, but it is worth a try.


On Mon, Jun 6, 2011 at 7:44 PM, Mauricio Cavallera <[email protected]> wrote:
> Unsubscribe
> El jun 6, 2011 10:54 a.m., "Joey Echeverria" <[email protected]> escribió:
>> Most of the network bandwidth used during a MapReduce job should come
>> from the shuffle/sort phase. This part doesn't use HDFS. The
>> TaskTrackers running reduce tasks will pull intermediate results from
>> TaskTrackers running map tasks over HTTP. In most cases, it's
>> difficult to get rack locality during this process because of the
>> contract between mappers and reducers. If you wanted locality, your
>> data would already have to be grouped by key in your source files and
>> you'd need to use a custom partitioner.
>>
>> -Joey
>>
>> On Mon, Jun 6, 2011 at 9:49 AM, <[email protected]> wrote:
>>>
>>> Yeah, the way you described it, maybe not. Because the hellabytes
>>> are all coming from one rack. But in reality, wouldn't this be
>>> more uniform because of how hadoop/hdfs work (distributed more evenly)?
>>>
>>> And if that is true, then for all the switched packets passing through
>>> the inter-rack switch, a consultation to the tracker would have preceeded
>>> it?
>>>
>>> Well, I'm just theorizing, and I'm sure we'll see more concrete numbers
>>> on the relation between # racks, # nodes, # switches, # trackers and
>>> their configurations.
>>>
>>> I like your idea about racking the trackers though. so it won't need any
>>> tracker trackers?!? ;)
>>>
>>> On Mon, 06 Jun 2011 09:40:12 -0400, John Armstrong
>>> <[email protected]> wrote:
>>>> On Mon, 06 Jun 2011 09:34:56 -0400, <[email protected]> wrote:
>>>>> Yeah, that's a good point.
>>>>>
>>>>> I wonder though, what the load on the tracker nodes (port et. al) would
>>>>> be if a inter-rack fiber switch at 10's of GBS' is getting maxed.
>>>>>
>>>>> Seems to me that if there is that much traffic being mitigate across
>>>>> racks, that the tracker node (or whatever node it is) would overload
>>>>> first?
>>>>
>>>> It could happen, but I don't think it would always.  For example,
>>> tracker
>>>> is on rack A; sees that the best place to put reducer R is on rack B;
>>> sees
>>>> reducer still needs a few hellabytes from mapper M on rack C; tells M to
>>>> send data to R; switches on B and C get throttled, leaving A free to
>>> handle
>>>> other things.
>>>>
>>>> In fact, it almost makes me wonder if an ideal setup is not only to have
>>>> each of the main control daemons on their own nodes, but to put THOSE
>>> nodes
>>>> on their own rack and keep all the data elsewhere.
>>>
>>
>>
>>
>> --
>> Joseph Echeverria
>> Cloudera, Inc.
>> 443.305.9434
>

Re: Why inter-rack communication in mapreduce slow?

Reply via email to