I've been encountering a frequent problem with the 2.0-rc2 release in the 
spider I'm working on where the HttpParser throws an exception when a extra 
byte is returned from a web server. When this exception is thrown, none of 
the Headers are returned even though they all contained valid data.


An example packet from Ethereal is attached.

As you can see, there is an extraneous byte (0x00) being sent that is causing 
the problem.

I've attached a quick and dirty patch to fix this. There was already a test 
looking for a length < 1 in order to skip processing. Rather than 
specifically looking for this case, I simply changes the check to look for a 
length < 2 on the grounds that there could never be a valid header of one 
character anyway. The patch is against HEAD, but would probably apply to 
2.0-rc2 release cleanly.

Let me know what you think.

Let me know if this is the wrong place to post this!

Andrew Buchanan
00000000  47 45 54 20 2f 3f 66 6e  3d 31 26 73 69 3d 38 34 GET /?fn =1&si=84
00000010  32 33 36 20 48 54 54 50  2f 31 2e 30 0d 0a 55 73 236 HTTP /1.0..Us
00000020  65 72 2d 41 67 65 6e 74  3a 20 48 75 67 68 43 72 er-Agent : HughCr
00000030  61 77 6c 65 72 2f 30 2e  36 0d 0a 48 6f 73 74 3a awler/0. 6..Host:
00000040  20 69 6e 2e 70 61 79 63  6f 75 6e 74 65 72 2e 63  in.payc ounter.c
00000050  6f 6d 0d 0a 0d 0a                                om....
                                                                              00000000 
 48 54 54 50 2f 31 2e 30  20 33 30 32 20 52 65 73 HTTP/1.0  302 Res
                                                                              00000010 
 6f 75 72 63 65 20 6d 6f  76 65 64 0d 0a 43 6f 6e ource mo ved..Con
                                                                              00000020 
 6e 65 63 74 69 6f 6e 3a  20 63 6c 6f 73 65 0d 0a nection:  close..
                                                                              00000030 
 53 65 72 76 65 72 3a 20  70 63 74 72 61 63 6b 64 Server:  pctrackd
                                                                              00000040 
 2f 30 2e 39 0d 0a 50 33  50 3a 20 70 6f 6c 69 63 /0.9..P3 P: polic
                                                                              00000050 
 79 72 65 66 3d 22 68 74  74 70 3a 2f 2f 77 77 77 yref="ht tp://www
                                                                              00000060 
 2e 70 61 79 63 6f 75 6e  74 65 72 2e 63 6f 6d 2f .paycoun ter.com/
                                                                              00000070 
 77 33 63 2f 70 33 70 2e  78 6d 6c 22 2c 20 43 50 w3c/p3p. xml", CP
                                                                              00000080 
 3d 22 4e 4f 4e 20 44 53  50 20 43 4f 52 20 44 45 ="NON DS P COR DE
                                                                              00000090 
 56 20 50 53 41 20 4f 55  52 20 42 55 53 20 4e 41 V PSA OU R BUS NA
                                                                              000000A0 
 56 20 53 54 41 20 50 52  45 22 0d 0a 53 65 74 2d V STA PR E"..Set-
                                                                              000000B0 
 43 6f 6f 6b 69 65 3a 20  70 63 74 72 61 63 6b 64 Cookie:  pctrackd
                                                                              000000C0 
 3d 30 30 30 4b 61 43 30  31 34 47 68 72 30 30 32 =000KaC0 14Ghr002
                                                                              000000D0 
 30 30 3b 20 70 61 74 68  3d 2f 3b 20 64 6f 6d 61 00; path =/; doma
                                                                              000000E0 
 69 6e 3d 2e 70 61 79 63  6f 75 6e 74 65 72 2e 63 in=.payc ounter.c
                                                                              000000F0 
 6f 6d 3b 20 65 78 70 69  72 65 73 3d 54 75 65 2c om; expi res=Tue,
                                                                              00000100 
 20 33 31 20 44 65 63 20  32 30 33 30 20 30 31 3a  31 Dec  2030 01:
                                                                              00000110 
 30 30 3a 30 30 20 47 4d  54 0d 0a 4c 6f 63 61 74 00:00 GM T..Locat
                                                                              00000120 
 69 6f 6e 3a 20 68 74 74  70 3a 2f 2f 77 77 77 2e ion: htt p://www.
                                                                              00000130 
 70 63 61 64 75 6c 74 2e  63 6f 6d 2f 3f 73 69 3d pcadult. com/?si=
                                                                              00000140 
 38 34 32 33 36 26 63 61  74 3d 32 0d 0a 00       84236&ca t=2...
Index: src/java/org/apache/commons/httpclient/HttpParser.java
===================================================================
RCS file: /home/cvspublic/jakarta-commons/httpclient/src/java/org/apache/commons/httpclient/HttpParser.java,v
retrieving revision 1.8
diff -u -r1.8 HttpParser.java
--- src/java/org/apache/commons/httpclient/HttpParser.java	15 Jul 2003 02:19:58 -0000	1.8
+++ src/java/org/apache/commons/httpclient/HttpParser.java	9 Jan 2004 20:26:42 -0000
@@ -170,7 +170,7 @@
         StringBuffer value = null;
         for (; ;) {
             String line = HttpParser.readLine(is);
-            if ((line == null) || (line.length() < 1)) {
+            if ((line == null) || (line.length() < 2)) {
                 break;
             }
 

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to