Some people express multibyte sequences directly in ragel with a char or unsigned char alphtype. There is contributed script in examples called unicode2ragel.rb that generates ragel definitions for ranges of unicode code points in utf8 or ucs4.
As a side note, it shoudl probably be in contrib. I'm going to move that now for anyone following the SVN directly. -Adrian Robert Lemmen wrote: > On Thu, May 21, 2009 at 11:34:35AM -0400, Wil Macaulay wrote: >> Depends on your platform, but my approach to this problem (on the Mac) >> was to detect >> the encoding, and convert to UTF-8 before parsing. I also converted >> line-endings (\r\n -> \n) >> and ensured a newline at the end of the data at the same time. > > how do you handle utf-8 in your ragel code? do you use a single-byte > alphtype and then handle the utf-8 sequences manually? > > cu robert > > > > ------------------------------------------------------------------------ > > _______________________________________________ > ragel-users mailing list > ragel-users@complang.org > http://www.complang.org/mailman/listinfo/ragel-users _______________________________________________ ragel-users mailing list ragel-users@complang.org http://www.complang.org/mailman/listinfo/ragel-users