Hi Daniel,

Any tools distributed with morfologik-stemming are in essence wrappers
to the core automaton classes. So you can build the in-memory
automaton on the fly (assuming it isn't very large this will have no
noticeable impact on performance). You could even merge a few
dictionaries into one (by decompressing them, sorting entries and
recompressing them again). As long as the inputs are not too large the
process should be very fast.

https://github.com/morfologik/morfologik-stemming/blob/master/morfologik-fsa/src/main/java/morfologik/fsa/FSABuilder.java

This is the class you will want to use in the end. Dictionaries will
have their own encodings and formats that eventually lead to the byte
array of each entry being added to the automaton; this would have to
be copy-pasted from other fragments of the code (encoders).

If you backtrack the uses of this class from the codebase you should
be able to reimplement dictionary encoding with ease.

Dawid

On Mon, Jan 26, 2015 at 1:05 PM, Daniel Naber
<daniel.na...@languagetool.org> wrote:
> Hi,
>
> we offer the ignore.txt file that users can add words to that the spell
> checker should not complain about. However, we don't use these for
> creating suggestions for misspelled words yet. It would be nice to use
> Morfologik for this, too, to avoid developing a second suggestion
> algorithm. There doesn't seem to be a mailing list for Morfologik, so
> I'll ask here:
>
> I guess Morfologik has no way to add words to an existing dictionary at
> runtime?
>
> Would it make sense to create a second binary dictionary from the
> ignore.txt at startup? Would it be a good approach to do what
> Morfologik's FSABuildTool does, only that we don't want to serialize to
> a file, but keep everything in memory (it's small enough and temporary
> files are ugly).
>
> The related bug report can be found at
> https://github.com/languagetool-org/languagetool/issues/231
>
> Regards
>   Daniel
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming. The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot Media, is your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Languagetool-devel mailing list
> Languagetool-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/languagetool-devel

------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to