Hi Richard, Richard Kelly <[email protected]> wrote on 05/08/2009 02:12:56 AM:
> Hi everyone, > > Just thought I would give an update on how I've been preparing for my > GSoC work. I managed to get my environment set up and I've been > building some basic XNI components to get a feel for the code. Sounds good. > I've also been researching the different > options of implementing the Unicode normalization functions. > Here are some pros/cons of the various approaches that I've thought of: > > ICU4J: [1] > (This is effectively the reference implementation of unicode normalization) > Pros: > - Currently compiles with Java 1.3 > - Is fully tested with all the exception > - Implements 'quick check' optimizations which allows you to pass > documents many times faster. > - License seems to be compatible with Xerces license. Yes, I think it is. It's been reviewed before on the legal-discuss list [3] and I believe there are other Apache projects (e.g. Harmony [4]) that already bundle it. > - Normalization code can be built as a modular component, so you > don't need the whole ICU4J library. > Cons: > - Future versions of ICU4J are not guaranteed to compile Java 1.3 in > future versions > - requires an additional license file to be added to the distribution > - adds a ~500kb jar file to the build > > > Java Normalizer [2] > Pros: > - No additional libraries needed. > - Functionality built into java so smaller file size. > - No license required. > Cons: > - Not available until Java 1.4+ > - Doesn't implement 'quick check' optimizations so its much slower. > > > Build from scratch: > Pros: > - Complete control of source code > - Can ensure that code compiles with Java 1.3 > Cons: > - Although the main functionality is fairly straight-forward, > some legacy Unicode requirements and edge cases make implementing the code > pretty complicated. > - Additional code maintenance if unicode standards change > > > > I am leaning towards the first option (ICU4J) but welcome any other > input / comments before I decide. +1. I think that's the best choice of the three you've presented. No sense reinventing the wheel (building it from scratch) if we don't need to and can't depend on [2] because it's only available in Java 6. > In this case, since the ICU4J license needs to be attached, would itbe okay to > create a text file called "LICENSE.normalizer" to handle this requirement? Yes, that's exactly how we handle the licenses for other dependencies. It should get included in the packages produced by the build. > Thanks, > - Richard > > [1] http://site.icu-project.org/ > [2] http://java.sun.com/javase/6/docs/api/java/text/Normalizer.html > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] Thanks. [3] http://markmail.org/thread/rkdg4u5ziusxnqat [4] http://harmony.markmail.org/search/?q=ICU Michael Glavassevich XML Parser Development IBM Toronto Lab E-mail: [email protected] E-mail: [email protected]
