One of the aims of the proposed cookie changes [1] was to deal with the
HTML 5 changes that mean UTF-8 can appear in cookie headers.

This has some potentially large implications for Tomcat.

Currently, Tomcat handles cookies as MessageBytes, processing everything
in bytes and only converting to String when necessary. This is largely
possible because of the assumption that everything is ASCII.

Introduce UTF-8 and processing everything in bytes gets a whole lot
harder. You essentially have to decode to UTF-8 to ensure that you have
valid data - at a which point why not just use Strings anyway?

I am currently leaning towards removing a lot of the current cookie
header caching  recycling and doing something along the following lines:
- Lazy parsing as currently (but unless cookie based session tracking is
  disabled this is going to run on every request)
- Convert headers to UTF-8 strings
- Parse them with a new parser along the lines of o.a.t.u.http.parser
- Have that parser return an array of javax.servlet.http.Cookie objects
- Pass those to the app if/when requested

In terms of handling RFC6265 and RFC2109 my plan is to have two parsers,
share as much code as possible and switch between them based on the
cookie header with the expectation that 99.9% of cookies will be parsed
by the RFC6265 parser. We could add some options to this switching to
enable other parsers (e.g. a Netscape parser) to be used.

I'd also like to keep the current cookie parsing implementation for now.
Until we are happy with the new parsing, the current implementation will
be the default. Once we are happy with the new parsing we can change the
default. We can add an option to switch between the current and the new
parsing.

Thoughts?


Mark


[1] https://wiki.apache.org/tomcat/Cookies

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org

Reply via email to