Hi everyone,

Just thought I would give an update on how I've been preparing for my
GSoC work.  I managed to get my environment set up and I've been
building some basic XNI components
to get a feel for the code.  I've also been researching the different
options of implementing the Unicode normalization functions.
Here are some pros/cons of the various approaches that I've thought of:

ICU4J: [1]
(This is effectively the reference implementation of unicode normalization)
Pros:
  - Currently compiles with Java 1.3
  - Is fully tested with all the exception
  - Implements 'quick check' optimizations which allows you to pass
documents many times faster.
  - License seems to be compatible with Xerces license.
  - Normalization code can be built as a modular component, so you
don't need the whole ICU4J library.
Cons:
  - Future versions of ICU4J are not guaranteed to compile Java 1.3 in
future versions
  - requires an additional license file to be added to the distribution
  - adds a ~500kb jar file to the build


Java Normalizer [2]
Pros:
  - No additional libraries needed.
  - Functionality built into java so smaller file size.
  - No license required.
Cons:
  - Not available until Java 1.4+
  - Doesn't implement 'quick check' optimizations so its much slower.


Build from scratch:
Pros:
  - Complete control of source code
  - Can ensure that code compiles with Java 1.3
Cons:
  - Although the main functionality is fairly straight-forward,
    some legacy Unicode requirements and edge cases make implementing the code
    pretty complicated.
  - Additional code maintenance if unicode standards change



I am leaning towards the first option (ICU4J) but welcome any other
input / comments before I decide.

In this case, since the ICU4J license needs to be attached, would it be okay to
create a text file called "LICENSE.normalizer" to handle this requirement?

Thanks,
- Richard

[1] http://site.icu-project.org/
[2] http://java.sun.com/javase/6/docs/api/java/text/Normalizer.html

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to