Hi everyone,
Just thought I would give an update on how I've been preparing for my
GSoC work. I managed to get my environment set up and I've been
building some basic XNI components
to get a feel for the code. I've also been researching the different
options of implementing the Unicode normalization functions.
Here are some pros/cons of the various approaches that I've thought of:
ICU4J: [1]
(This is effectively the reference implementation of unicode normalization)
Pros:
- Currently compiles with Java 1.3
- Is fully tested with all the exception
- Implements 'quick check' optimizations which allows you to pass
documents many times faster.
- License seems to be compatible with Xerces license.
- Normalization code can be built as a modular component, so you
don't need the whole ICU4J library.
Cons:
- Future versions of ICU4J are not guaranteed to compile Java 1.3 in
future versions
- requires an additional license file to be added to the distribution
- adds a ~500kb jar file to the build
Java Normalizer [2]
Pros:
- No additional libraries needed.
- Functionality built into java so smaller file size.
- No license required.
Cons:
- Not available until Java 1.4+
- Doesn't implement 'quick check' optimizations so its much slower.
Build from scratch:
Pros:
- Complete control of source code
- Can ensure that code compiles with Java 1.3
Cons:
- Although the main functionality is fairly straight-forward,
some legacy Unicode requirements and edge cases make implementing the code
pretty complicated.
- Additional code maintenance if unicode standards change
I am leaning towards the first option (ICU4J) but welcome any other
input / comments before I decide.
In this case, since the ICU4J license needs to be attached, would it be okay to
create a text file called "LICENSE.normalizer" to handle this requirement?
Thanks,
- Richard
[1] http://site.icu-project.org/
[2] http://java.sun.com/javase/6/docs/api/java/text/Normalizer.html
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]