[ http://issues.apache.org/jira/browse/HADOOP-195?page=comments#action_12378313 ]
Dominik Friedrich commented on HADOOP-195: ------------------------------------------ Has anybody tried to use the APR (Apache portable runtime) with a JNI wrapper like tomcat? With this wrapper you could use OS features like sendfile, epoll, random number generator and so on. I haven't used it myself, just saw some performance test with JBoss web which is using this. This is might bit off topic, but Java NIO has been mentioned before. I've played around with Java NIO some weeks ago to see where it could be usefull in Nutch/hadoop. With my simple tests I found no significant performance improvements in file IO. I guess the tests were just too simple (serializing/deserializing Java objects to/from disk) to give useful results. I also tested the network throughput with a multiplexed socket compared to the one-thread-per-client design. With NIO the throughput was almost independent from the number of concurrent connections while the threading overhead became very significant with 100+ threads. My testbed was a simple server with two IO thread and a few worker thread and bunch of clients that sent messages (serialized Java objects) to the server. On the server side one IO thread read messages from the socket and put them into a blocking queue and the other IO thread read outgoing messages from another blocking queue and sent them. The worker thread pulled messages from the in-queue, work on them (in my test they just copied the message) and put their result on the out-queue. This way the server could handle a few 1000 connections without problem. This design or something similar might be useful for the namenode or distributed search as mentioned before. > transfer map output transfer with http instead of rpc > ----------------------------------------------------- > > Key: HADOOP-195 > URL: http://issues.apache.org/jira/browse/HADOOP-195 > Project: Hadoop > Type: Improvement > Components: mapred > Versions: 0.2 > Reporter: Owen O'Malley > Assignee: Owen O'Malley > Fix For: 0.3 > > The data transfer of the map output should be transfered via http instead > rpc, because rpc is very slow for this application and the timeout behavior > is suboptimal. (server sends data and client ignores it because it took more > than 10 seconds to be received.) -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
