https://issues.apache.org/bugzilla/show_bug.cgi?id=51400

--- Comment #12 from Christopher Schultz <ch...@christopherschultz.net> 
2011-06-23 20:02:11 UTC ---
> > I suppose it's a fairly small set of encodings, but with little benefit,
> > there's no reason IMO to pre-populate.
>
> You're right; however if I read the reports correctly, this is true if 
> charsets
> with valid names only are used for the lookup. But everytime when there is a
> loopkup for a non-existing Charset, the JVM-synchronized Charset.lookup() is
> called. Probably to speed this up, Konstantin Kolinko suggested to cache
> charset missings.

Duh. I hadn't thought of spurious lookups causing their own synchronization
disasters.

Perhaps the invalid-charset cache could be limited in some way: MRU caches are
easy to build with the standard Java library.

> If a list with all avaliable charsets would be pre-populated, including their
> aliases, missing charsets could also be determined fast. 

True: if the encoding is not supported by the JVM, then it's invalid no matter
what. In that case, case normalization is probably a good thing to do: if it's
not in the case (after normalization), then it's not valid... no reason to ever
call Charset.lookup() after startup.

> Well, on my Windows machine the longest alias (not canonical name) of a 
> charset
> is "Extended_UNIX_Code_Packed_Format_for_Japanese" which consists of 39 
> mutable
> characters.

Wow.

> The current (trunk) implementation in
> o.a.tomcat.util.buf.B2CConverter.getCharset() does not normalize the name, so 
> a
> Client could send requests with 2^39 permutations in a Content-Type header
> (which would make 49 TiB of Charset strings) ;-)

My math might be wrong, too, but I believe that's only 512GiB if names are
1-byte-per-char, but I think Java does 2-bytes-per-char, so it's 1TiB.

You're right, though: that's pretty huge.

+1 to case normalization.
+1 to LUT pre-population.
-1 to LUT miss caching: it's totally unnecessary given the above.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org

Reply via email to