2012/3/13  <ma...@apache.org>:
> Author: markt
> Date: Tue Mar 13 14:39:24 2012
> New Revision: 1300154
>
> URL: http://svn.apache.org/viewvc?rev=1300154&view=rev
> Log:
> Add an HTTP header parser. The driver for this was an attempt to fix 
> https://issues.apache.org/bugzilla/show_bug.cgi?id=52811
>
> Parsing HTTP headers as per RFC2616 is not always as simple as it first 
> appears. For headers that only use tokens the simple approach will normally 
> be sufficient. However, for the other headers, while simple code meets 99.9% 
> of cases, there are often some edge cases that make things far more 
> complicated.
>
> The purpose of this parser is to let the parser worry about the edge cases. 
> It provides strict parsing of HTTP header values assuming that wrapped header 
> lines have already been unwrapped. (The Tomcat header processing code does 
> the unwrapping.)
>
> The parser currently supports parsing of the following HTTP header values as 
> per RFC 2616:
>  - Content-Type
>
> Support for additional headers will be provided as required. A quick scan of 
> the Tomcat code base suggested a couple of places where using this parser may 
> be useful such as  Ranges in the default servlet but there was not - at this 
> point - a compelling case for immediate replacement. The expectation is that 
> as problems are identified in header parsing, the fix will typically extend 
> this parser to support the problematic header and then use the parser rather 
> than custom code.
>
> Added:
>    tomcat/trunk/java/org/apache/tomcat/util/http/parser/   (with props)
>    tomcat/trunk/java/org/apache/tomcat/util/http/parser/AstAttribute.java   
> (with props)
>    tomcat/trunk/java/org/apache/tomcat/util/http/parser/AstMediaType.java   
> (with props)
>    tomcat/trunk/java/org/apache/tomcat/util/http/parser/AstParameter.java   
> (with props)
>    tomcat/trunk/java/org/apache/tomcat/util/http/parser/AstSubType.java   
> (with props)
>    tomcat/trunk/java/org/apache/tomcat/util/http/parser/AstType.java   (with 
> props)
>    tomcat/trunk/java/org/apache/tomcat/util/http/parser/AstValue.java   (with 
> props)
>    tomcat/trunk/java/org/apache/tomcat/util/http/parser/HttpParser.java   
> (with props)
>    tomcat/trunk/java/org/apache/tomcat/util/http/parser/HttpParser.jjt   
> (with props)
>    
> tomcat/trunk/java/org/apache/tomcat/util/http/parser/HttpParserConstants.java 
>   (with props)
>    
> tomcat/trunk/java/org/apache/tomcat/util/http/parser/HttpParserTokenManager.java
>    (with props)
>    
> tomcat/trunk/java/org/apache/tomcat/util/http/parser/HttpParserTreeConstants.java
>    (with props)
>    
> tomcat/trunk/java/org/apache/tomcat/util/http/parser/JJTHttpParserState.java  
>  (with props)
>    tomcat/trunk/java/org/apache/tomcat/util/http/parser/Node.java   (with 
> props)
>    tomcat/trunk/java/org/apache/tomcat/util/http/parser/ParseException.java   
> (with props)
>    tomcat/trunk/java/org/apache/tomcat/util/http/parser/SimpleCharStream.java 
>   (with props)
>    tomcat/trunk/java/org/apache/tomcat/util/http/parser/SimpleNode.java   
> (with props)
>    tomcat/trunk/java/org/apache/tomcat/util/http/parser/Token.java   (with 
> props)
>    tomcat/trunk/java/org/apache/tomcat/util/http/parser/TokenMgrError.java   
> (with props)
>    tomcat/trunk/test/org/apache/tomcat/util/http/parser/
>    tomcat/trunk/test/org/apache/tomcat/util/http/parser/TestMediaType.java   
> (with props)
> Modified:
>    tomcat/trunk/.gitignore
>    tomcat/trunk/build.xml
>    tomcat/trunk/java/org/apache/coyote/Response.java
>

Reviewing HttpParser.jjt,

1)  It seems that quoted text handling ("<IN_QUOTED_TEXT> TOKEN"
construct) is slightly wrong in its handling of backslash character.

RFC2616:
[[[
       quoted-string  = ( <"> *(qdtext | quoted-pair ) <"> )
       qdtext         = <any TEXT except <">>

   The backslash character ("\") MAY be used as a single-character
   quoting mechanism only within quoted-string and comment constructs.

       quoted-pair    = "\" CHAR
]]]
       CHAR           = <any US-ASCII character (octets 0 - 127)>

So quoted-pair is '\' + any character 0-127, but from the jjt file I
do not see it:
I do not see how it allows "any CHAR" after "\".

As far as I understand our implementation allows either double quote
or QUOTED_TEXT_CHAR, but QUOTED_TEXT_CHAR != CHAR. It does not include
"\", CTL characters,  and wrongly includes character 128-255.


2) RFC2616: ch.3.7 Media Types

says
       media-type     = type "/" subtype *( ";" parameter )
       type           = token
       subtype        = token
and then
"Linear white space
   (LWS) MUST NOT be used between the type and subtype, nor between an
   attribute and its value."

I see that we allow LWS there, because our definitions of HTTP_TOKEN
and QUOTED_STRING already include leading and trailing LWS in them,
and thus there are trim() calls everywhere in the jjt file.

I am OK with it though, because I do not see any problem from it, but
maybe LWS could be removed from TOKEN and QUOTED_STRING constructs
along with those trim() calls.


3) In the same chapter,

"The type, subtype, and parameter attribute names are case-
   insensitive."

I do not see where we allow "charset" parameter name to be
case-insensitive. We allow lowercase only.

-> AstMediaType#getCharset()
-> AstMediaType#toStringNoCharset()

4) The "value" of a parameter can be quoted-string. We do not handle
unquoting of charset parameter value.  I am OK with leaving as is,
because nobody asked.



Best regards,
Konstantin Kolinko

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org

Reply via email to