On Tue, Dec 6, 2011 at 8:45 AM, Nick Wellnhofer <[email protected]> wrote: > What I still want to do is to incorporate the word break test cases from the > Unicode website: > > http://www.unicode.org/Public/6.0.0/ucd/auxiliary/WordBreakTest.txt >
we use a script that generates a unit test from this file... maybe you can reuse some of the code for your purposes: http://svn.apache.org/repos/asf/lucene/dev/trunk/modules/analysis/common/src/test/org/apache/lucene/analysis/core/generateJavaUnicodeWordBreakTest.pl -- lucidimagination.com
