I read it through. See my suggested changes attached.
Content wise:
"This kind of use pattern is strongly discouraged because of the excessive and unnecessary garbage collection involved."
Is this really the problem? I mean GC performance for short-lived objects is very good. I would rather point out that different HttpClient instances will all have a separate connection pool which makes things very inefficient. A shared connection pool can boost performance instead.
--- performance.xml 2005-01-12 09:18:09.495084800 +0100
+++ performance2.xml 2005-01-12 09:41:05.233299200 +0100
@@ -1,64 +1,71 @@
<?xml version="1.0" encoding="ISO-8859-1"?>
-
<document>
-
<properties>
<title>HttpClient Performance Optimization Guide</title>
<author email="oleg -at- ural.ru">Oleg Kalnichevski</author>
<revision>$Id$</revision>
</properties>
-
<body>
-
<section name="Introduction">
- <p>
- Per default HttpClient is configured to provide maximum reliability and
HTTP standards
+ <p>
+ By default HttpClient is configured to provide maximum reliability and
HTTP standards
compliance rather than raw performance. There are several configuration
options and
- optimization techniques, which can significantly improve performance of
HttpClient.
+ optimization techniques which can significantly improve the performance
of HttpClient.
</p>
- <p>
+ <p>
There are also several anti-patterns that should be avoided to achieve
best results
using HttpClient.
</p>
-
<subsection name="Contents">
<ul>
- <li><a href="#Reuse of HttpClient instance">Reuse of HttpClient
instance</a></li>
- <li><a href="#Connection persistence">Connection persistence</a></li>
- <li><a href="#Concurrent execution of HTTP methods">Concurrent
execution of HTTP methods</a></li>
- <li><a href="#Request/Response entity streaming">Request/Response
entity streaming</a></li>
- <li><a href="#Expect-continue handshake">Expect-continue
handshake</a></li>
- <li><a href="#Stale connection check">Stale connection check</a></li>
- <li><a href="#Cookie processing">Cookie processing</a></li>
+ <li>
+ <a href="#Reuse of HttpClient instance">Reuse the HttpClient
instance</a>
+ </li>
+ <li>
+ <a href="#Connection persistence">Connection persistence</a>
+ </li>
+ <li>
+ <a href="#Concurrent execution of HTTP methods">Concurrent
execution of HTTP methods</a>
+ </li>
+ <li>
+ <a href="#Request/Response entity streaming">Request/Response
entity streaming</a>
+ </li>
+ <li>
+ <a href="#Expect-continue handshake">Expect-continue handshake</a>
+ </li>
+ <li>
+ <a href="#Stale connection check">Stale connection check</a>
+ </li>
+ <li>
+ <a href="#Cookie processing">Cookie processing</a>
+ </li>
</ul>
</subsection>
</section>
-
- <section name="Reuse of HttpClient instance">
- <p>
+ <section name="Reuse the HttpClient instance">
+ <p>
One of the most common and, unfortunately, detrimental anti-patterns is
an excessive
instantiation and disposal of HttpClient instances. In the most extreme
case a new
instance of HttpClient is created per each HTTP request. This kind of
use pattern is
strongly discouraged because of the excessive and unnecessary garbage
collection involved.
- When an instance of HttpClient goes out if scope and is marked for
garbage collection,
- usually along with it go out of scope all the parameters, the default
HTTP state, cookies,
+ When an instance of HttpClient is subject to garbage collection,
+ usually along with it go all the parameters, the default HTTP state,
cookies,
user credentials, the connection manager and most importantly HTTP
connections, some of
which may still be open. In the worst case scenario before the garbage
collection kicks
in there may be hundreds of open sockets leading to serious resource
problems.
</p>
- <p>
- Generally it is recommended to have just a single instance of
HttpClient per communication
+ <p>
+ Generally it is recommended to have just one single instance of
HttpClient per communication
component or even per application. However, if the application makes
use of HttpClient
only very infrequently and keeping an idle instance of HttpClient in
memory is not warranted,
- it is highly recommended to explicitly <a
-
href="apidocs/org/apache/commons/httpclient/MultiThreadedHttpConnectionManager.html#shutdown()">
+ it is highly recommended to explicitly <a
href="apidocs/org/apache/commons/httpclient/MultiThreadedHttpConnectionManager.html#shutdown()">
shut down</a> the multithreaded connection manager prior to disposing
the HttpClient instance, which will ensure proper closure of all HTTP
connections in the
connection pool.
</p>
</section>
<section name="Connection persistence">
- <p>
+ <p>
HttpClient always makes its best efforts to reuse connections. The
connection
persistence is always on per default and requires no configuration. If
the connection
persistence for some reason needs to be disabled, the best way to
achieve that is to
@@ -68,32 +75,32 @@
</p>
</section>
<section name="Concurrent execution of HTTP methods">
- <p>
+ <p>
If the application logic allows for execution of multiple HTTP requests
concurrently,
- for instance, multiple requests against different sites, or multiple
requests representing
- different user identities, the use of a dedicated thread per HTTP
session can result in a
+ (e.g. multiple requests against different sites, or multiple requests
representing
+ different user identities) the use of a dedicated thread per HTTP
session can result in a
significant performance gain. HttpClient is fully thread-safe when used
with a thread-safe
- connection manager such as <a
-
href="apidocs/org/apache/commons/httpclient/MultiThreadedHttpConnectionManager.html">
+ connection manager such as <a
href="apidocs/org/apache/commons/httpclient/MultiThreadedHttpConnectionManager.html">
MultiThreadedHttpConnectionManager</a>. Please note that each respective
thread of execution
must have a local instance of HttpMethod and can have a local instance
of HttpState or/and
HostConfiguration to represent a specific host configuration and
conversational state. At the
- same time HttpClient instance should be shared by all threads for
maximum efficiency.
+ same time HttpClient instance should be shared among all threads for
maximum efficiency.
</p>
- <p>
+ <p>
For details on using multiple threads with HttpClient please to the <a
href="threading.html">
HttpClient Threading Guide</a>.
</p>
</section>
<section name="Request/Response entity streaming">
- <p>
- HttpClient is capable of efficient request/response body streaming.
Large entities can be submitted
- or received without having to be buffered in memory. This is especially
critical if multiple HTTP
- methods may be executed concurrently. In this case the use of strings or
byte arrays to provide or
- consume request/response body may severely affect scalability or even
cause out of memory condition.
+ <p>
+ HttpClient is capable of efficient request/response body streaming.
Large entities may be submitted
+ or received without beeing buffered in memory. This is especially
critical if multiple HTTP
+ methods may be executed concurrently. While there are convenience
methods to deal with entities as
+ strings or byte arrays their use is discouraged. Unless used carefully
they can easily lead to
+ out of memory conditions because they imply buffering of the complete
entity in memory.
</p>
- <p>
- <strong>Response streaming:</strong> It is recommended to consume the
HTTP response body as a stream of
+ <p>
+ <strong>Response streaming:</strong> It is recommended to consume the
HTTP response body as a stream of
characters using HttpMethod#getResponseBodyAsStream method. The use of
HttpMethod#getResponseBody and
HttpMethod#getResponseBodyAsString is strongly discouraged. These
methods will be deprecated in the future
release of HttpClient.
@@ -108,9 +115,9 @@
} finally {
httpget.releaseConnection();
}]]></source>
- </p>
- <p>
- <strong>Request streaming:</strong> Main difficulty one may encounter
when streaming request bodies is that
+ </p>
+ <p>
+ <strong>Request streaming:</strong> Main difficulty one may encounter
when streaming request bodies is that
sometimes entity enclosing methods need to be retried due to an
authentication failure or an I/O failure.
Obviously non-buffered entities cannot be reread and resubmitted. The
recommended approach is to create a custom
<a
href="apidocs/org/apache/commons/httpclient/methods/RequestEntity.html">RequestEntity</a>
capable of
@@ -154,25 +161,25 @@
File myfile = new File("myfile.txt");
PostMethod httppost = new PostMethod("/stuff");
httppost.setRequestEntity(new FileRequestEntity(myfile));]]></source>
- </p>
+ </p>
</section>
<section name="Expect-continue handshake">
- <p>
- The purpose of the 100 (Continue) status is to allow a client that is
sending a request message with
- a request body to determine if the origin server is willing to accept
the request (based on the
- request headers) before the client sends the request body. It may be
highly inefficient for the client
- to send the request body if the server will reject the request without
looking at the body.
+ <p>
+ The purpose of the HTTP 100 (Continue) status is to allow a client that
is sending a request message with
+ a request entity to determine if the origin server is willing to accept
the request (based on the
+ request headers) before the client sends the request entity. It is
highly inefficient for the client
+ to send the request entity if the server will reject the request without
looking at the body.
Authentication failures are the most common reason for the request to be
rejected based on the request
headers alone. Therefore, the use of 'Expect-continue' handshake is
especially recommended with
those target servers that require HTTP authentication. However, for
proxied requests caution
- must be exercised as older HTTP/1.0 proxies may be unable to correctly
handle the 'Expect-continue'
+ must be taken as older HTTP/1.0 proxies may be unable to correctly
handle the 'Expect-continue'
handshake.
</p>
</section>
<section name="Stale connection check">
- <p>
- HTTP specification permits both the client and the server to terminate
the persistent (kept alive)
- connection at any time without a notice to the counterpart, thus
rendering the connection invalid,
+ <p>
+ HTTP specification permits both the client and the server to terminate a
persistent (keep-alive)
+ connection at any time without notice to the counterpart, thus rendering
the connection invalid,
or stale. Per default prior to executing a request HttpClient performs a
check to determine if the
active connection is stale. The cost of this operation is about 15-30 ms
depending on JRE used.
Disabling stale connection check may result in slight performance
improvement, especially for small
@@ -181,13 +188,11 @@
</p>
</section>
<section name="Cookie processing">
- <p>
- If the application such as web spider does not need to maintain
conversational state with the state
- with the target server, a small performance gain can made by disabling
cookie processing. For details
+ <p>
+ If the application such as web spider does not need to maintain
conversational state with the target
+ server, a small performance gain can made by disabling cookie
processing. For details
on cookie processing please to the <a href="cookies.html">HttpClient
Cookies Guide</a>.
</p>
</section>
-
</body>
-
</document>--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
