Thanks for your answer :) Allen Wittenauer a écrit : > On Apr 26, 2010, at 6:23 AM, Druilhe Remi wrote: > >> For example, when I run "wordcount" example, there is HDFS communications >> and MapReduce communications and I am not able to distinguish which packet >> belong to HDFS or to MapReduce. >> > > This shouldn't be too surprising given that the MapReduce job needs to talk > to HDFS to determine input and to write output. > You right, there is a link between HDFS and MapReduce but I hoped capture each communications in separated files to deal with each, independently. >> A way could be to use odd port number for HDFS and even port number for >> MapReduce, but I think I have to modify source code. >> > > The ports for the services are already separated out. > > In general, client -> server connections map out as: > > MR -> MR, HDFS > HDFS -> HDFS > But, is there an easy way to determine which port belong to which process once sockets are opened ? Because Hadoop uses a JVM, I can't use netstat. I can see which port is connected but not which process in the JVM uses it.
Hadoop uses log4j, maybe there is a property that can give me what I am looking for. > Given a small 3 node grid, a dump of what processes open what ports, and what > connections are made between all the machines, it should be trivial to make a > more complex connection map. [You can probably even do it as a map reduce > job. :) ] Regards, Rémi Druilhe
