I would like to mention that we have been using a tool for generating tokenizers called Quex:
http://quex.sourceforge.net/
Quex is similar to Flex++ and generates C++ tokenizers but it can handle
text in various
encodings, including UTF-8, and regular expressions allow using Unicode
properties.
-- Beppe Attardi
Università di Pisa
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support
