Re: Separate communications of HDFS and MapReduce

Druilhe Remi Tue, 27 Apr 2010 06:02:19 -0700

Thanks for your answer :)

Allen Wittenauer a écrit :
> On Apr 26, 2010, at 6:23 AM, Druilhe Remi wrote:
>   
>> For example, when I run "wordcount" example, there is HDFS communications 
>> and MapReduce communications and I am not able to distinguish which packet 
>> belong to HDFS or to MapReduce.
>>     
>
> This shouldn't be too surprising given that the MapReduce job needs to talk 
> to HDFS to determine input and to write output.
>   
You right, there is a link between HDFS and MapReduce but I hoped
capture each communications in separated files to deal with each,
independently.
>> A way could be to use odd port number for HDFS and even port number for 
>> MapReduce, but I think I have to modify source code.
>>     
>
> The ports for the services are already separated out.  
>
> In general, client -> server connections map out as:
>
> MR -> MR, HDFS
> HDFS -> HDFS
>   
But, is there an easy way to determine which port belong to which
process once sockets are opened ? Because Hadoop uses a JVM, I can't use
netstat. I can see which port is connected but not which process in the
JVM uses it.


Hadoop uses log4j, maybe there is a property that can give me what I am
looking for.
> Given a small 3 node grid, a dump of what processes open what ports, and what 
> connections are made between all the machines, it should be trivial to make a 
> more complex connection map.  [You can probably even do it as a map reduce 
> job. :) ]
Regards,

Rémi Druilhe

Re: Separate communications of HDFS and MapReduce

Reply via email to