I've been encountering a frequent problem with the 2.0-rc2 release in the spider I'm working on where the HttpParser throws an exception when a extra byte is returned from a web server. When this exception is thrown, none of the Headers are returned even though they all contained valid data.
An example packet from Ethereal is attached. As you can see, there is an extraneous byte (0x00) being sent that is causing the problem. I've attached a quick and dirty patch to fix this. There was already a test looking for a length < 1 in order to skip processing. Rather than specifically looking for this case, I simply changes the check to look for a length < 2 on the grounds that there could never be a valid header of one character anyway. The patch is against HEAD, but would probably apply to 2.0-rc2 release cleanly. Let me know what you think. Let me know if this is the wrong place to post this! Andrew Buchanan
00000000 47 45 54 20 2f 3f 66 6e 3d 31 26 73 69 3d 38 34 GET /?fn =1&si=84 00000010 32 33 36 20 48 54 54 50 2f 31 2e 30 0d 0a 55 73 236 HTTP /1.0..Us 00000020 65 72 2d 41 67 65 6e 74 3a 20 48 75 67 68 43 72 er-Agent : HughCr 00000030 61 77 6c 65 72 2f 30 2e 36 0d 0a 48 6f 73 74 3a awler/0. 6..Host: 00000040 20 69 6e 2e 70 61 79 63 6f 75 6e 74 65 72 2e 63 in.payc ounter.c 00000050 6f 6d 0d 0a 0d 0a om.... 00000000 48 54 54 50 2f 31 2e 30 20 33 30 32 20 52 65 73 HTTP/1.0 302 Res 00000010 6f 75 72 63 65 20 6d 6f 76 65 64 0d 0a 43 6f 6e ource mo ved..Con 00000020 6e 65 63 74 69 6f 6e 3a 20 63 6c 6f 73 65 0d 0a nection: close.. 00000030 53 65 72 76 65 72 3a 20 70 63 74 72 61 63 6b 64 Server: pctrackd 00000040 2f 30 2e 39 0d 0a 50 33 50 3a 20 70 6f 6c 69 63 /0.9..P3 P: polic 00000050 79 72 65 66 3d 22 68 74 74 70 3a 2f 2f 77 77 77 yref="ht tp://www 00000060 2e 70 61 79 63 6f 75 6e 74 65 72 2e 63 6f 6d 2f .paycoun ter.com/ 00000070 77 33 63 2f 70 33 70 2e 78 6d 6c 22 2c 20 43 50 w3c/p3p. xml", CP 00000080 3d 22 4e 4f 4e 20 44 53 50 20 43 4f 52 20 44 45 ="NON DS P COR DE 00000090 56 20 50 53 41 20 4f 55 52 20 42 55 53 20 4e 41 V PSA OU R BUS NA 000000A0 56 20 53 54 41 20 50 52 45 22 0d 0a 53 65 74 2d V STA PR E"..Set- 000000B0 43 6f 6f 6b 69 65 3a 20 70 63 74 72 61 63 6b 64 Cookie: pctrackd 000000C0 3d 30 30 30 4b 61 43 30 31 34 47 68 72 30 30 32 =000KaC0 14Ghr002 000000D0 30 30 3b 20 70 61 74 68 3d 2f 3b 20 64 6f 6d 61 00; path =/; doma 000000E0 69 6e 3d 2e 70 61 79 63 6f 75 6e 74 65 72 2e 63 in=.payc ounter.c 000000F0 6f 6d 3b 20 65 78 70 69 72 65 73 3d 54 75 65 2c om; expi res=Tue, 00000100 20 33 31 20 44 65 63 20 32 30 33 30 20 30 31 3a 31 Dec 2030 01: 00000110 30 30 3a 30 30 20 47 4d 54 0d 0a 4c 6f 63 61 74 00:00 GM T..Locat 00000120 69 6f 6e 3a 20 68 74 74 70 3a 2f 2f 77 77 77 2e ion: htt p://www. 00000130 70 63 61 64 75 6c 74 2e 63 6f 6d 2f 3f 73 69 3d pcadult. com/?si= 00000140 38 34 32 33 36 26 63 61 74 3d 32 0d 0a 00 84236&ca t=2...
Index: src/java/org/apache/commons/httpclient/HttpParser.java =================================================================== RCS file: /home/cvspublic/jakarta-commons/httpclient/src/java/org/apache/commons/httpclient/HttpParser.java,v retrieving revision 1.8 diff -u -r1.8 HttpParser.java --- src/java/org/apache/commons/httpclient/HttpParser.java 15 Jul 2003 02:19:58 -0000 1.8 +++ src/java/org/apache/commons/httpclient/HttpParser.java 9 Jan 2004 20:26:42 -0000 @@ -170,7 +170,7 @@ StringBuffer value = null; for (; ;) { String line = HttpParser.readLine(is); - if ((line == null) || (line.length() < 1)) { + if ((line == null) || (line.length() < 2)) { break; }
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]