Ok, I just found what I was looking for. I haven't read the documentation until the end :-/
Druilhe Remi a écrit : > Thanks for your answer :) > > Allen Wittenauer a écrit : > >> On Apr 26, 2010, at 6:23 AM, Druilhe Remi wrote: >> >> >>> For example, when I run "wordcount" example, there is HDFS communications >>> and MapReduce communications and I am not able to distinguish which packet >>> belong to HDFS or to MapReduce. >>> >>> >> This shouldn't be too surprising given that the MapReduce job needs to talk >> to HDFS to determine input and to write output. >> >> > You right, there is a link between HDFS and MapReduce but I hoped > capture each communications in separated files to deal with each, > independently. > >>> A way could be to use odd port number for HDFS and even port number for >>> MapReduce, but I think I have to modify source code. >>> >>> >> The ports for the services are already separated out. >> >> In general, client -> server connections map out as: >> >> MR -> MR, HDFS >> HDFS -> HDFS >> >> > But, is there an easy way to determine which port belong to which > process once sockets are opened ? Because Hadoop uses a JVM, I can't use > netstat. I can see which port is connected but not which process in the > JVM uses it. > > Hadoop uses log4j, maybe there is a property that can give me what I am > looking for. > >> Given a small 3 node grid, a dump of what processes open what ports, and >> what connections are made between all the machines, it should be trivial to >> make a more complex connection map. [You can probably even do it as a map >> reduce job. :) ] >> > Regards, > > Rémi Druilhe >
