W dniu 2012-12-08 14:34, Daniel Naber pisze:
> On 30.11.2012, 11:57:28 Marcin Miłkowski wrote:
>
>> Alternatively, one could try developing a direct hunspell parser that
>> creates a graph by using the .aff file.
>
> I must admit that I have no clue how to approach that. Looking at
> FSABuilder it seems there are only bytes that can be added? What
> are those bytes? Or am I looking in the wrong place?
>
> http://morfologik.sourceforge.net/api/1.5.4/morfologik-fsa/morfologik/fsa/FSABuilder.html
>
> Note that I'm currently not planning to work on this, but I would like
> to understand this approach better.

The nodes in the graph are bytes. Basically, you have to encode the 
characters in whatever encoding you use to bytes, and build the 
automaton from bytes.

In 8-bit ASCII encodings, of course, these bytes map easily onto 
characters, but with UTF-8 it becomes different. However, a UTF-8 string 
can be also encoded as a stream of bytes, so it makes no difference.

Best,
Marcin



------------------------------------------------------------------------------
LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services
Discover what IT Professionals Know. Rescue delivers
http://p.sf.net/sfu/logmein_12329d2d
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to