2012/3/13 <ma...@apache.org>: > Author: markt > Date: Tue Mar 13 14:39:24 2012 > New Revision: 1300154 > > URL: http://svn.apache.org/viewvc?rev=1300154&view=rev > Log: > Add an HTTP header parser. The driver for this was an attempt to fix > https://issues.apache.org/bugzilla/show_bug.cgi?id=52811 > > Parsing HTTP headers as per RFC2616 is not always as simple as it first > appears. For headers that only use tokens the simple approach will normally > be sufficient. However, for the other headers, while simple code meets 99.9% > of cases, there are often some edge cases that make things far more > complicated. > > The purpose of this parser is to let the parser worry about the edge cases. > It provides strict parsing of HTTP header values assuming that wrapped header > lines have already been unwrapped. (The Tomcat header processing code does > the unwrapping.) > > The parser currently supports parsing of the following HTTP header values as > per RFC 2616: > - Content-Type > > Support for additional headers will be provided as required. A quick scan of > the Tomcat code base suggested a couple of places where using this parser may > be useful such as Ranges in the default servlet but there was not - at this > point - a compelling case for immediate replacement. The expectation is that > as problems are identified in header parsing, the fix will typically extend > this parser to support the problematic header and then use the parser rather > than custom code. > > Added: > tomcat/trunk/java/org/apache/tomcat/util/http/parser/ (with props) > tomcat/trunk/java/org/apache/tomcat/util/http/parser/AstAttribute.java > (with props) > tomcat/trunk/java/org/apache/tomcat/util/http/parser/AstMediaType.java > (with props) > tomcat/trunk/java/org/apache/tomcat/util/http/parser/AstParameter.java > (with props) > tomcat/trunk/java/org/apache/tomcat/util/http/parser/AstSubType.java > (with props) > tomcat/trunk/java/org/apache/tomcat/util/http/parser/AstType.java (with > props) > tomcat/trunk/java/org/apache/tomcat/util/http/parser/AstValue.java (with > props) > tomcat/trunk/java/org/apache/tomcat/util/http/parser/HttpParser.java > (with props) > tomcat/trunk/java/org/apache/tomcat/util/http/parser/HttpParser.jjt > (with props) > > tomcat/trunk/java/org/apache/tomcat/util/http/parser/HttpParserConstants.java > (with props) > > tomcat/trunk/java/org/apache/tomcat/util/http/parser/HttpParserTokenManager.java > (with props) > > tomcat/trunk/java/org/apache/tomcat/util/http/parser/HttpParserTreeConstants.java > (with props) > > tomcat/trunk/java/org/apache/tomcat/util/http/parser/JJTHttpParserState.java > (with props) > tomcat/trunk/java/org/apache/tomcat/util/http/parser/Node.java (with > props) > tomcat/trunk/java/org/apache/tomcat/util/http/parser/ParseException.java > (with props) > tomcat/trunk/java/org/apache/tomcat/util/http/parser/SimpleCharStream.java > (with props) > tomcat/trunk/java/org/apache/tomcat/util/http/parser/SimpleNode.java > (with props) > tomcat/trunk/java/org/apache/tomcat/util/http/parser/Token.java (with > props) > tomcat/trunk/java/org/apache/tomcat/util/http/parser/TokenMgrError.java > (with props) > tomcat/trunk/test/org/apache/tomcat/util/http/parser/ > tomcat/trunk/test/org/apache/tomcat/util/http/parser/TestMediaType.java > (with props) > Modified: > tomcat/trunk/.gitignore > tomcat/trunk/build.xml > tomcat/trunk/java/org/apache/coyote/Response.java >
Reviewing HttpParser.jjt, 1) It seems that quoted text handling ("<IN_QUOTED_TEXT> TOKEN" construct) is slightly wrong in its handling of backslash character. RFC2616: [[[ quoted-string = ( <"> *(qdtext | quoted-pair ) <"> ) qdtext = <any TEXT except <">> The backslash character ("\") MAY be used as a single-character quoting mechanism only within quoted-string and comment constructs. quoted-pair = "\" CHAR ]]] CHAR = <any US-ASCII character (octets 0 - 127)> So quoted-pair is '\' + any character 0-127, but from the jjt file I do not see it: I do not see how it allows "any CHAR" after "\". As far as I understand our implementation allows either double quote or QUOTED_TEXT_CHAR, but QUOTED_TEXT_CHAR != CHAR. It does not include "\", CTL characters, and wrongly includes character 128-255. 2) RFC2616: ch.3.7 Media Types says media-type = type "/" subtype *( ";" parameter ) type = token subtype = token and then "Linear white space (LWS) MUST NOT be used between the type and subtype, nor between an attribute and its value." I see that we allow LWS there, because our definitions of HTTP_TOKEN and QUOTED_STRING already include leading and trailing LWS in them, and thus there are trim() calls everywhere in the jjt file. I am OK with it though, because I do not see any problem from it, but maybe LWS could be removed from TOKEN and QUOTED_STRING constructs along with those trim() calls. 3) In the same chapter, "The type, subtype, and parameter attribute names are case- insensitive." I do not see where we allow "charset" parameter name to be case-insensitive. We allow lowercase only. -> AstMediaType#getCharset() -> AstMediaType#toStringNoCharset() 4) The "value" of a parameter can be quoted-string. We do not handle unquoting of charset parameter value. I am OK with leaving as is, because nobody asked. Best regards, Konstantin Kolinko --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org For additional commands, e-mail: dev-h...@tomcat.apache.org