W dniu 2012-12-08 14:34, Daniel Naber pisze: > On 30.11.2012, 11:57:28 Marcin Miłkowski wrote: > >> Alternatively, one could try developing a direct hunspell parser that >> creates a graph by using the .aff file. > > I must admit that I have no clue how to approach that. Looking at > FSABuilder it seems there are only bytes that can be added? What > are those bytes? Or am I looking in the wrong place? > > http://morfologik.sourceforge.net/api/1.5.4/morfologik-fsa/morfologik/fsa/FSABuilder.html > > Note that I'm currently not planning to work on this, but I would like > to understand this approach better.
The nodes in the graph are bytes. Basically, you have to encode the characters in whatever encoding you use to bytes, and build the automaton from bytes. In 8-bit ASCII encodings, of course, these bytes map easily onto characters, but with UTF-8 it becomes different. However, a UTF-8 string can be also encoded as a stream of bytes, so it makes no difference. Best, Marcin ------------------------------------------------------------------------------ LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial Remotely access PCs and mobile devices and provide instant support Improve your efficiency, and focus on delivering more value-add services Discover what IT Professionals Know. Rescue delivers http://p.sf.net/sfu/logmein_12329d2d _______________________________________________ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel