I completely agree with you that we all should write standards-compliant HTTP 
web pages/CGI programs/Servlets etc.

Unfortunately, it is not always in our hands. As nearly everybody can write 
PHP scripts today, clients which attempt to read from them, should be 
error-tolerant.

If you want to run a HTTP Client against an arbitrary number of URLs pointing 
to unknown web servers/pages (as in my case - I am writing a web crawler), 
you must be able to guarantee a deadlock-free, fault-tolerant way of reading 
each page. Have you ever heard of "spider traps" ?

Let's have a simple HttpClient command constellation:

public void test() {
HttpClient client = new HttpClient();
HttpMethod m = new GetMethod("http://localhost/testfile.php";);
client.executeMethod(m);

// --- bytes limit as suggested in discussion
                InputStream body = m.getResponseBodyAsStream();
                int limit = 10; // limit to first ten bytes
                int i;
                for(i=0;i<=limit;i++) {
                        int b = body.read();
                        if(b < 0) {
                                break;
                        }
                }
                System.err.println("EOF at byte "+i);
// ---

m.releaseConnection();
}

The following PHP scripts will cause HttpClient to loop endlessly (1) or to 
hang (2):


Test 1: endless.php
<?php
set_time_limit(-1);
while(TRUE) {
  print "The UNIX time is ".time()."<br>\n";
  flush();
  sleep(1);
}
?>

will cause the program hang at "m.releaseConnection()"


Test 2: hang-in-headers.php
<?php
  // remember to set an adequate memory limit in php.ini
  // or use Apache's "asis"-feature instead of PHP

  $x = str_repeat("X",1024*1024*32); // send 32M of 'X'
  Header("HTTP/1.0 300 Multiple Choices");
  Header("Location: http://localhost/".$x);
?>

In this case it will crash  with OutOfMemoryError because 32M won't usually 
fit into the JVM's memory when parsing Headers.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to