Re: Why inter-rack communication in mapreduce slow?

Mauricio Cavallera Mon, 06 Jun 2011 18:46:00 -0700
Unsubscribe
El jun 6, 2011 10:54 a.m., "Joey Echeverria" <[email protected]> escribió:
> Most of the network bandwidth used during a MapReduce job should come
> from the shuffle/sort phase. This part doesn't use HDFS. The
> TaskTrackers running reduce tasks will pull intermediate results from
> TaskTrackers running map tasks over HTTP. In most cases, it's
> difficult to get rack locality during this process because of the
> contract between mappers and reducers. If you wanted locality, your
> data would already have to be grouped by key in your source files and
> you'd need to use a custom partitioner.
>
> -Joey
>
> On Mon, Jun 6, 2011 at 9:49 AM, <[email protected]> wrote:
>>
>> Yeah, the way you described it, maybe not. Because the hellabytes
>> are all coming from one rack. But in reality, wouldn't this be
>> more uniform because of how hadoop/hdfs work (distributed more evenly)?
>>
>> And if that is true, then for all the switched packets passing through
>> the inter-rack switch, a consultation to the tracker would have preceeded
>> it?
>>
>> Well, I'm just theorizing, and I'm sure we'll see more concrete numbers
>> on the relation between # racks, # nodes, # switches, # trackers and
>> their configurations.
>>
>> I like your idea about racking the trackers though. so it won't need any
>> tracker trackers?!? ;)
>>
>> On Mon, 06 Jun 2011 09:40:12 -0400, John Armstrong
>> <[email protected]> wrote:
>>> On Mon, 06 Jun 2011 09:34:56 -0400, <[email protected]> wrote:
>>>> Yeah, that's a good point.
>>>>
>>>> I wonder though, what the load on the tracker nodes (port et. al) would
>>>> be if a inter-rack fiber switch at 10's of GBS' is getting maxed.
>>>>
>>>> Seems to me that if there is that much traffic being mitigate across
>>>> racks, that the tracker node (or whatever node it is) would overload
>>>> first?
>>>
>>> It could happen, but I don't think it would always.  For example,
>> tracker
>>> is on rack A; sees that the best place to put reducer R is on rack B;
>> sees
>>> reducer still needs a few hellabytes from mapper M on rack C; tells M to
>>> send data to R; switches on B and C get throttled, leaving A free to
>> handle
>>> other things.
>>>
>>> In fact, it almost makes me wonder if an ideal setup is not only to have
>>> each of the main control daemons on their own nodes, but to put THOSE
>> nodes
>>> on their own rack and keep all the data elsewhere.
>>
>
>
>
> --
> Joseph Echeverria
> Cloudera, Inc.
> 443.305.9434
Re: Why inter-rack communication in mapreduce slow?

Reply via email to