Re: Separate communications of HDFS and MapReduce

Druilhe Remi Wed, 28 Apr 2010 01:33:09 -0700

Ok, I just found what I was looking for. I haven't read the
documentation until the end :-/


Druilhe Remi a écrit :
> Thanks for your answer :)
>
> Allen Wittenauer a écrit :
>   
>> On Apr 26, 2010, at 6:23 AM, Druilhe Remi wrote:
>>   
>>     
>>> For example, when I run "wordcount" example, there is HDFS communications 
>>> and MapReduce communications and I am not able to distinguish which packet 
>>> belong to HDFS or to MapReduce.
>>>     
>>>       
>> This shouldn't be too surprising given that the MapReduce job needs to talk 
>> to HDFS to determine input and to write output.
>>   
>>     
> You right, there is a link between HDFS and MapReduce but I hoped
> capture each communications in separated files to deal with each,
> independently.
>   
>>> A way could be to use odd port number for HDFS and even port number for 
>>> MapReduce, but I think I have to modify source code.
>>>     
>>>       
>> The ports for the services are already separated out.  
>>
>> In general, client -> server connections map out as:
>>
>> MR -> MR, HDFS
>> HDFS -> HDFS
>>   
>>     
> But, is there an easy way to determine which port belong to which
> process once sockets are opened ? Because Hadoop uses a JVM, I can't use
> netstat. I can see which port is connected but not which process in the
> JVM uses it.
>
> Hadoop uses log4j, maybe there is a property that can give me what I am
> looking for.
>   
>> Given a small 3 node grid, a dump of what processes open what ports, and 
>> what connections are made between all the machines, it should be trivial to 
>> make a more complex connection map.  [You can probably even do it as a map 
>> reduce job. :) ]
>>     
> Regards,
>
> Rémi Druilhe
>

Re: Separate communications of HDFS and MapReduce

Reply via email to