Am 29.03.2009 20:27, Martin Buchholz schrieb:
On Fri, Mar 27, 2009 at 15:44, Ulf Zibis <ulf.zi...@gmx.de> wrote:
I also have coded such a test for full-scan comparision:
See CharsetsTest + LegacyCharset (it retrieves the legacy charsets by
reflection directly from rt.jar of the patched JDK) here:
https://java-nio-charset-enhanced.dev.java.net/source/browse/java-nio-charset-enhanced/trunk/test/sun/nio/cs/
It cost me several nights having all code points equal, faced to my special
mixture of range-limited direct maps and full-range indirected map.
It does look like you've written a lot of good tests.
It would be nice not to have an explicit list of charsets in
CharsetsTest.java.PARAMETERS.
The advantage of this list is, that I can disable charsets by
line-commenting to speed up the test while debugging special cases.
I guess it's a list of charsets subject to single-byte testing?
Yes, + charsets depending on those. E.g. EUC-JP depends on JIS-X-0201.
If so, better documentation would be good.
Charsets named ISO-8859-* are guaranteed to be single-byte,
it might be good to include those programmatically,
by filtering Charsets.availableCharsets().
Good idea, but how to catch those, which internally use single-byte
charsets e.g. JIS-X-0201?
Why include EUC-JP but not UTF-8?
UTF-8 is not affected of my changes in single-byte charsets.
It's probably still a good idea to get inspiration from my
Find*Bugs
I'll keep this in mind.
tests which test many other things like
complete compatibility of exceptions in case of invalid input.
I see, this would affect our discussion about malformed().
Concerning the malformed length on invalid low surrogate, I now have
understood your philosophy while hacking the UTF-8 coder. As result I've
filed a bug:
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6798515
Concerning \uFFFE and \uFFFF I still think, that they are invalid, as
these code points don't have any valid meaning from Java VM side, so why
should they be seen as possibly mappable to other char encodings.
Handling of BOM etc. should be done otherwise, e.g. by coder
initialization or the flush() method.
The problem is more human. One would like to give credit for good ideas
or good analysis, but the only official way to give credit in a commit
message is
via a simple
Contributed-by: email-address
which raises legal doubts even when there is no copyrighted material.
I guess one can abuse the Summary: field to squeeze in thank-yous,
but it's pretty obvious that you are circumventing the process.
The last paragraph is difficult for me to understand in english. Could
you please translate it?
-Ulf