Hi Daniel, Any tools distributed with morfologik-stemming are in essence wrappers to the core automaton classes. So you can build the in-memory automaton on the fly (assuming it isn't very large this will have no noticeable impact on performance). You could even merge a few dictionaries into one (by decompressing them, sorting entries and recompressing them again). As long as the inputs are not too large the process should be very fast.
https://github.com/morfologik/morfologik-stemming/blob/master/morfologik-fsa/src/main/java/morfologik/fsa/FSABuilder.java This is the class you will want to use in the end. Dictionaries will have their own encodings and formats that eventually lead to the byte array of each entry being added to the automaton; this would have to be copy-pasted from other fragments of the code (encoders). If you backtrack the uses of this class from the codebase you should be able to reimplement dictionary encoding with ease. Dawid On Mon, Jan 26, 2015 at 1:05 PM, Daniel Naber <daniel.na...@languagetool.org> wrote: > Hi, > > we offer the ignore.txt file that users can add words to that the spell > checker should not complain about. However, we don't use these for > creating suggestions for misspelled words yet. It would be nice to use > Morfologik for this, too, to avoid developing a second suggestion > algorithm. There doesn't seem to be a mailing list for Morfologik, so > I'll ask here: > > I guess Morfologik has no way to add words to an existing dictionary at > runtime? > > Would it make sense to create a second binary dictionary from the > ignore.txt at startup? Would it be a good approach to do what > Morfologik's FSABuildTool does, only that we don't want to serialize to > a file, but keep everything in memory (it's small enough and temporary > files are ugly). > > The related bug report can be found at > https://github.com/languagetool-org/languagetool/issues/231 > > Regards > Daniel > > > ------------------------------------------------------------------------------ > Dive into the World of Parallel Programming. The Go Parallel Website, > sponsored by Intel and developed in partnership with Slashdot Media, is your > hub for all things parallel software development, from weekly thought > leadership blogs to news, videos, case studies, tutorials and more. Take a > look and join the conversation now. http://goparallel.sourceforge.net/ > _______________________________________________ > Languagetool-devel mailing list > Languagetool-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/languagetool-devel ------------------------------------------------------------------------------ Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ _______________________________________________ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel