I read it through. See my suggested changes attached.


Content wise:

"This kind of use pattern is strongly discouraged because of the excessive and unnecessary garbage collection involved."

Is this really the problem? I mean GC performance for short-lived objects is very good. I would rather point out that different HttpClient instances will all have a separate connection pool which makes things very inefficient. A shared connection pool can boost performance instead.

--- performance.xml     2005-01-12 09:18:09.495084800 +0100
+++ performance2.xml    2005-01-12 09:41:05.233299200 +0100
@@ -1,64 +1,71 @@
 <?xml version="1.0" encoding="ISO-8859-1"?>
-
 <document>
-  
   <properties>
     <title>HttpClient Performance Optimization Guide</title>
     <author email="oleg -at- ural.ru">Oleg Kalnichevski</author>
     <revision>$Id$</revision>
   </properties>
-
   <body>
-
     <section name="Introduction">
-     <p>
-       Per default HttpClient is configured to provide maximum reliability and 
HTTP standards 
+      <p>
+       By default HttpClient is configured to provide maximum reliability and 
HTTP standards 
        compliance rather than raw performance. There are several configuration 
options and
-       optimization techniques, which can significantly improve performance of 
HttpClient.
+       optimization techniques which can significantly improve the performance 
of HttpClient.
      </p>
-     <p>
+      <p>
        There are also several anti-patterns that should be avoided to achieve 
best results
        using HttpClient.
      </p>
-
       <subsection name="Contents">
         <ul>
-          <li><a href="#Reuse of HttpClient instance">Reuse of HttpClient 
instance</a></li>
-          <li><a href="#Connection persistence">Connection persistence</a></li>
-          <li><a href="#Concurrent execution of HTTP methods">Concurrent 
execution of HTTP methods</a></li>
-          <li><a href="#Request/Response entity streaming">Request/Response 
entity streaming</a></li>
-          <li><a href="#Expect-continue handshake">Expect-continue 
handshake</a></li>
-          <li><a href="#Stale connection check">Stale connection check</a></li>
-          <li><a href="#Cookie processing">Cookie processing</a></li>
+          <li>
+            <a href="#Reuse of HttpClient instance">Reuse the HttpClient 
instance</a>
+          </li>
+          <li>
+            <a href="#Connection persistence">Connection persistence</a>
+          </li>
+          <li>
+            <a href="#Concurrent execution of HTTP methods">Concurrent 
execution of HTTP methods</a>
+          </li>
+          <li>
+            <a href="#Request/Response entity streaming">Request/Response 
entity streaming</a>
+          </li>
+          <li>
+            <a href="#Expect-continue handshake">Expect-continue handshake</a>
+          </li>
+          <li>
+            <a href="#Stale connection check">Stale connection check</a>
+          </li>
+          <li>
+            <a href="#Cookie processing">Cookie processing</a>
+          </li>
         </ul>
       </subsection>
     </section>
-
-    <section name="Reuse of HttpClient instance">
-     <p>
+    <section name="Reuse the HttpClient instance">
+      <p>
        One of the most common and, unfortunately, detrimental anti-patterns is 
an excessive 
        instantiation and disposal of HttpClient instances. In the most extreme 
case a new 
        instance of HttpClient is created per each HTTP request. This kind of 
use pattern is 
        strongly discouraged because of the excessive and unnecessary garbage 
collection involved.
-       When an instance of HttpClient goes out if scope and is marked for 
garbage collection, 
-       usually along with it go out of scope all the parameters, the default 
HTTP state, cookies, 
+       When an instance of HttpClient is subject to garbage collection, 
+       usually along with it go all the parameters, the default HTTP state, 
cookies, 
        user credentials, the connection manager and most importantly HTTP 
connections, some of
        which may still be open. In the worst case scenario before the garbage 
collection kicks 
        in there may be hundreds of open sockets leading to serious resource 
problems.
      </p>
-     <p>
-       Generally it is recommended to have just a single instance of 
HttpClient per communication
+      <p>
+       Generally it is recommended to have just one single instance of 
HttpClient per communication
        component or even per application. However, if the application makes 
use of HttpClient 
        only very infrequently and keeping an idle instance of HttpClient in 
memory is not warranted,
-       it is highly recommended to explicitly <a 
-       
href="apidocs/org/apache/commons/httpclient/MultiThreadedHttpConnectionManager.html#shutdown()">
+       it is highly recommended to explicitly <a 
href="apidocs/org/apache/commons/httpclient/MultiThreadedHttpConnectionManager.html#shutdown()">
        shut down</a> the multithreaded connection manager prior to disposing
        the HttpClient instance, which will ensure proper closure of all HTTP 
connections in the
        connection pool. 
      </p>
     </section>
     <section name="Connection persistence">
-     <p>
+      <p>
        HttpClient always makes its best efforts to reuse connections. The 
connection 
        persistence is always on per default and requires no configuration. If 
the connection
        persistence for some reason needs to be disabled, the best way to 
achieve that is to 
@@ -68,32 +75,32 @@
      </p>
     </section>
     <section name="Concurrent execution of HTTP methods">
-     <p>
+      <p>
       If the application logic allows for execution of multiple HTTP requests 
concurrently, 
-      for instance, multiple requests against different sites, or multiple 
requests representing 
-      different user identities, the use of a dedicated thread per HTTP 
session can result in a 
+      (e.g. multiple requests against different sites, or multiple requests 
representing 
+      different user identities) the use of a dedicated thread per HTTP 
session can result in a 
       significant performance gain. HttpClient is fully thread-safe when used 
with a thread-safe
-      connection manager such as <a 
-      
href="apidocs/org/apache/commons/httpclient/MultiThreadedHttpConnectionManager.html">
+      connection manager such as <a 
href="apidocs/org/apache/commons/httpclient/MultiThreadedHttpConnectionManager.html">
       MultiThreadedHttpConnectionManager</a>. Please note that each respective 
thread of execution 
       must have a local instance of HttpMethod and can have a local instance 
of HttpState or/and
       HostConfiguration to represent a specific host configuration and 
conversational state. At the
-      same time HttpClient instance should be shared by all threads for 
maximum efficiency.
+      same time HttpClient instance should be shared among all threads for 
maximum efficiency.
      </p>
-     <p>
+      <p>
       For details on using multiple threads with HttpClient please to the <a 
href="threading.html">
       HttpClient Threading Guide</a>.  
      </p>
     </section>
     <section name="Request/Response entity streaming">
-     <p>
-      HttpClient is capable of efficient request/response body streaming. 
Large entities can be submitted 
-      or received without having to be buffered in memory. This is especially 
critical if multiple HTTP 
-      methods may be executed concurrently. In this case the use of strings or 
byte arrays to provide or 
-      consume request/response body may severely affect scalability or even 
cause out of memory condition.
+      <p>
+      HttpClient is capable of efficient request/response body streaming. 
Large entities may be submitted 
+      or received without beeing buffered in memory. This is especially 
critical if multiple HTTP 
+      methods may be executed concurrently. While there are convenience 
methods to deal with entities as
+      strings or byte arrays their use is discouraged. Unless used carefully 
they can easily lead to
+      out of memory conditions because they imply buffering of the complete 
entity in memory.
      </p>
-     <p>
-       <strong>Response streaming:</strong> It is recommended to consume the 
HTTP response body as a stream of
+      <p>
+        <strong>Response streaming:</strong> It is recommended to consume the 
HTTP response body as a stream of
        characters using HttpMethod#getResponseBodyAsStream method. The use of 
HttpMethod#getResponseBody and 
        HttpMethod#getResponseBodyAsString is strongly discouraged. These 
methods will be deprecated in the future
        release of HttpClient.
@@ -108,9 +115,9 @@
   } finally {
     httpget.releaseConnection();
   }]]></source>
-     </p>
-     <p>
-       <strong>Request streaming:</strong> Main difficulty one may encounter 
when streaming request bodies is that
+      </p>
+      <p>
+        <strong>Request streaming:</strong> Main difficulty one may encounter 
when streaming request bodies is that
         sometimes entity enclosing methods need to be retried due to an 
authentication failure or an I/O failure. 
         Obviously non-buffered entities cannot be reread and resubmitted. The 
recommended approach is to create a custom 
         <a 
href="apidocs/org/apache/commons/httpclient/methods/RequestEntity.html">RequestEntity</a>
 capable of 
@@ -154,25 +161,25 @@
 File myfile = new File("myfile.txt");
 PostMethod httppost = new PostMethod("/stuff");
 httppost.setRequestEntity(new FileRequestEntity(myfile));]]></source>
-     </p>
+      </p>
     </section>
     <section name="Expect-continue handshake">
-     <p>
-      The purpose of the 100 (Continue) status is to allow a client that is 
sending a request message with 
-      a request body to determine if the origin server is willing to accept 
the request (based on the 
-      request headers) before the client sends the request body. It may be 
highly inefficient for the client
-      to send the request body if the server will reject the request without 
looking at the body. 
+      <p>
+      The purpose of the HTTP 100 (Continue) status is to allow a client that 
is sending a request message with 
+      a request entity to determine if the origin server is willing to accept 
the request (based on the 
+      request headers) before the client sends the request entity. It is 
highly inefficient for the client
+      to send the request entity if the server will reject the request without 
looking at the body. 
       Authentication failures are the most common reason for the request to be 
rejected based on the request
       headers alone. Therefore, the use of 'Expect-continue' handshake is 
especially recommended with 
       those target servers that require HTTP authentication. However, for 
proxied requests caution
-      must be exercised as older HTTP/1.0 proxies may be unable to correctly 
handle the 'Expect-continue' 
+      must be taken as older HTTP/1.0 proxies may be unable to correctly 
handle the 'Expect-continue' 
       handshake.
      </p>
     </section>
     <section name="Stale connection check">
-     <p>
-      HTTP specification permits both the client and the server to terminate 
the persistent (kept alive) 
-      connection at any time without a notice to the counterpart, thus 
rendering the connection invalid,
+      <p>
+      HTTP specification permits both the client and the server to terminate a 
persistent (keep-alive) 
+      connection at any time without notice to the counterpart, thus rendering 
the connection invalid,
       or stale. Per default prior to executing a request HttpClient performs a 
check to determine if the 
       active connection is stale. The cost of this operation is about 15-30 ms 
depending on JRE used.
       Disabling stale connection check may result in slight performance 
improvement, especially for small
@@ -181,13 +188,11 @@
      </p>
     </section>
     <section name="Cookie processing">
-     <p>
-      If the application such as web spider does not need to maintain 
conversational state with the state
-      with the target server, a small performance gain can made by disabling 
cookie processing. For details 
+      <p>
+      If the application such as web spider does not need to maintain 
conversational state with the target
+      server, a small performance gain can made by disabling cookie 
processing. For details 
       on cookie processing please to the <a href="cookies.html">HttpClient 
Cookies Guide</a>.
      </p>
     </section>
-
   </body>
-
 </document>

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to