[ 
http://issues.apache.org/jira/browse/HADOOP-195?page=comments#action_12378313 ] 

Dominik Friedrich commented on HADOOP-195:
------------------------------------------

Has anybody tried to use the APR (Apache portable runtime) with a JNI wrapper 
like tomcat? With this wrapper you could use OS features like sendfile, epoll, 
random number generator and so on. I haven't used it myself, just saw some 
performance test with JBoss web which is using this.

This is might bit off topic, but Java NIO has been mentioned before. I've 
played around with Java NIO some weeks ago to see where it could be usefull in 
Nutch/hadoop. With my simple tests I found no significant performance 
improvements in file IO. I guess the tests were just too simple 
(serializing/deserializing Java objects to/from disk) to give useful results.

I also tested the network throughput with a multiplexed socket compared to the 
one-thread-per-client design. With NIO the throughput was almost independent 
from the number of concurrent connections while the threading overhead became 
very significant with 100+ threads. 

My testbed was a simple server with two IO thread and a few worker thread and 
bunch of clients that sent messages (serialized Java objects) to the server. On 
the server side one IO thread read messages from the socket and put them into a 
blocking queue and the other IO thread read outgoing messages from another 
blocking queue and sent them. The worker thread pulled messages from the 
in-queue, work on them (in my test they just copied the message) and put their 
result on the out-queue. This way the server could handle a few 1000 
connections without problem. This design or something similar might be useful 
for the namenode or distributed search as mentioned before.

> transfer map output transfer with http instead of rpc
> -----------------------------------------------------
>
>          Key: HADOOP-195
>          URL: http://issues.apache.org/jira/browse/HADOOP-195
>      Project: Hadoop
>         Type: Improvement

>   Components: mapred
>     Versions: 0.2
>     Reporter: Owen O'Malley
>     Assignee: Owen O'Malley
>      Fix For: 0.3

>
> The data transfer of the map output should be transfered via http instead 
> rpc, because rpc is very slow for this application and the timeout behavior 
> is suboptimal. (server sends data and client ignores it because it took more 
> than 10 seconds to be received.)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply via email to