[ http://issues.apache.org/jira/browse/HADOOP-312?page=all ]
Devaraj Das updated HADOOP-312:
-------------------------------
Attachment: no_conn_caching.patch
Increasing the accept queue length and a simple retry mechanism helped very
much. Two cases - (1) where idle connections are cached for a max of 1 sec at
the client, and (2) where connections are fully cached.
The performance of the sort benchmark (total time it takes to complete the run)
is, most of the times, better with (1). But with a few tasks failing here and
there (in both cases), it's actually hard to conclusively say anything about
performance in terms of the time it takes to run the benchmark. Made the accept
queue length configurable (since that can be manually set on Linux systems as
part of the configurable TCP/IP parameters) with the default being 128.
> Connections should not be cached
> --------------------------------
>
> Key: HADOOP-312
> URL: http://issues.apache.org/jira/browse/HADOOP-312
> Project: Hadoop
> Issue Type: Improvement
> Components: ipc
> Reporter: Devaraj Das
> Assigned To: Devaraj Das
> Attachments: no_conn_caching.patch, no_connection_caching.patch,
> no_connection_caching.patch
>
>
> Servers and clients (client include datanodes, tasktrackers, DFSClients &
> tasks) should not cache connections or maybe cache them for very short
> periods of time. Clients should set up & tear down connections to the servers
> everytime they need to contact the servers (including the heartbeats). If
> connection is cached, then reuse the existing connection for a few subsequent
> transactions until the connection expires. The heartbeat interval should be
> more so that many more clients (order of tens of thousands) can be
> accomodated within 1 heartbeat interval.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira