If you can show that a simple test program that appears to access only 2 charsets in fact causes accesses to 3 or 4, that is a serious problem with the 2-element cache.
People at Google are working on better caches, but I don't think they are quite ready today. Perhaps, instead of a small charset cache, we could cache all the charsets, but for the large charsets like GB18030, we could, inside the charset implementation, cache the large data tables using a soft reference, and recompute as needed. Then most of the static memory used by an unused charset could be reclaimed. In general, high quality caching is hard, much harder than it looks. Martin On Wed, Oct 7, 2009 at 15:58, Ulf Zibis <ulf.zi...@gmx.de> wrote: >> I don't think it's worth a point fix here unless an actual wrong result >> can be demonstrated. I do think a more sophisticated charset cache >> would be good, but hard to get right. >> > > The other point is the size of the cache, see > http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6795535. > I have logged the usage of the Charset.lookup() method from a simple test > which has only called ISO-8859-1 and IBM037 . As you can see, UTF-8 and > cp1252 (default encoding on German Windows) is frequently requested from the > VM, so IMO size 2 is too restrictive (note the different aliases UTF-8, > utf-8 and UTF8): > UTF-8 > utf-8 > UTF-8 > Cp1252 > UTF-8 > UTF-8 > UTF-8 > UTF-8 > UTF-8 > UTF-8 > UTF8 > UTF8 > Cp1252 > Cp1252 > Cp1252 > Cp1252 > Cp1252 > Cp1252 > Cp1252 > Cp1252 > Cp1252 > Cp1252 > Cp1252 > Cp1252 > UTF-8 > IBM037 > UTF-8 > UTF-8 > utf-8 > ISO-8859-1 > UTF-8 > > > -Ulf > > >