Am 27.03.2009 22:49, Martin Buchholz schrieb:
Again, Ulf, I love the sort of stuff you're doing.
Much thanks again for the flowers. :-)
I hope to be able to contribute some enginering
to your effort myself someday.
In the meantime, we need some infrastructure to guarantee that
the behavior of the charsets is completely unchanged as we optimize.
I have some code left behind at Sun to do that, i.e. compare different
JDKs w.r.t charset compatibility.
Hopefully Sun engineers can resurrect that code and perhaps put it
into a public mercurial repo somewhere.
Another approach is to take the code in tests like my
Find{En,De}coderBugs.java tests which compare direct
vs. regular buffers, and retarget it to compare two different jdks.
I also have coded such a test for full-scan comparision:
See CharsetsTest + LegacyCharset (it retrieves the legacy charsets by
reflection directly from rt.jar of the patched JDK) here:
https://java-nio-charset-enhanced.dev.java.net/source/browse/java-nio-charset-enhanced/trunk/test/sun/nio/cs/
It cost me several nights having all code points equal, faced to my
special mixture of range-limited direct maps and full-range indirected map.
It's too difficult to give credit to external contributors.
One problem is that the Contributed-by: line is a red flag to
lawyers and other folks that might cause the legality of the change
to be questioned without end. Let's try to get Ulf a proper commit bit
and make sure the legal questions come to an end.
Aren't "Contributed-by" and "author" comments usual practice in open
source products?
Even in Sun's JRL source author was mentioned. I think, the lawyer guys
and girls from Sun should rethink that subject.
Ok, we will see ...
Martin
On Fri, Mar 27, 2009 at 13:29, Ulf Zibis <ulf.zi...@gmx.de> wrote:
Hi folks,
milestone 4 of charset enhancement is released.
- I reduced the jar-footprint, concerning entire single-byte needs, compared
to original JDK 6 binaries, down to 7 %, which also should perform class
loading, (not to forget: encoder maps are lazy initialized), even though
there are added 21 specialized coder algorithms.
- In this release there is only 1 class <SingleByteCharset> for all
single-byte charsets, which reads decoder mapping + all names including
aliases from a small data file (69..731 Bytes, average 250 Bytes). This is
possible, because numerous charsets can inherit their mappings (256 2-byte
chars) from each other, and empty or 1:1 ranges (especially \u0000..\u007F)
are filled by constructor.
- Additionally a set of 7 Decoder and 14 Encoder classes do there work,
specially speed + memory optimised for the charsets, having diverse
character spreading and frequency of occurrence. A special MapCalculator
class for playing with different parameters is provided in the test package.
- The aliases and historical names should no more statically and entirely
loaded, provided and linked from StandardCharsets class. They additionally
could be easy edited in files standard-charsets and extended-charsets (refer
Bug Id: 6795538). If some day they are defined entirely upper-case, they
could be omitted completely, as they are redundantly case-standardised
existing in the FastCharsetProvider lookup maps. Determining the
'contains()' references by this way would be also reasonable (refer Bug Id:
6761481), but containment of ASCII is already calculated automatically.
See my projects home: ---> https://java-nio-charset-enhanced.dev.java.net/
I believe, these techniques could also be used for most multi-byte charsets,
especially inheriting maps to reduce entire charsets footprint.
Outlook Milestone 5 : Final performance optimisation by dedicated inlining,
exception catching, surrogate handling etc..
Urgently waiting for Christian Thalinger's
optimization of "widening conversions".
Happy easter,
-Ulf
P.S.: I'm on the way, providing changesets slice by slice for OpenJDK 7.
BTW: Is there a way to add author and/or contributor annotation in the
sources to honour the investigation of external collaborators (almost 1 year
in my case)?