Re: The spell checker issue
Some time ago I started a project that would use Hunspell with Java (I think at the time I used that library that uses BridJ). It worked well, but I also needed to have custom dictionaries, and another interesting feature would be merge dictionaries. Never had time to continue investigating it, but I remember it wasn't straightforward add new words, and for some cases I'd have to use the Hunspell syntax to tell it about the possible variations of the word. Having something in Java would definitely be helpful. So if someone starts a GitHub repo, and have the skills to either port the existing Hunspell code (not sure about license issues) or write one from scratch based on papers, I'd be keen to take a look, help testing, and maybe even send some pull requests :) That'd be great not just LT, but also for several other OSS projects. At the moment, when I need a simple dictionary and don't need all features of Hunspell, I prefer to use jazzy. Cheers,Bruno From: Daniel Naber <daniel.na...@languagetool.org> To: LanguageTool Developer List <languagetool-devel@lists.sourceforge.net> Sent: Wednesday, 29 June 2016 9:28 PM Subject: The spell checker issue Hi, yesterday I tried to update the English dictionary that LT includes. The details are documented at https://github.com/languagetool-org/languagetool/issues/329 but in a nutshell: our spell checking is so complicated that the dictionary update didn't work. We could really need a process that allows us to use hunspell dictionaries directly, without conversion to other formats. The original reason we don't use hunspell (or only parts of it) is that it's slow, especially when it comes to generating suggestions. Today I ran a test with hunspell 1.4.1 and LT, and it turns out LT is about 4-5 times faster. What could be a solution: A) Improve hunspell to be faster. We'd need someone who can do this and then we'd still rely on native code, which isn't what we want in Java (but we've lived with it for years now). B) Finally write a Java-based spell checker that can read hunspell dictionaries. The internet is full of spell checkers, but we need one with support for advanced features like compound words (important for German). C) I don't know, do you have an idea? If we cannot find a solution, the current situation will persist so that some dictionaries probably won't be updated. Regards Daniel This is the text for testing, full of typos (supposed to be German): Fgen Siex hxier Ixhren Txext eiwen. Klcken ie nch dr Prüung aug diw fatbig unteelegten Textstellwn. oder notzen Sie desen Teyt alls Beeispiel füür eein Paat Fwhler , diw LanguageTool erkwnnen ksnn: Ih wirde Ankst und banke. -- Attend Shape: An AT Tech Expo July 15-16. Meet us at AT Park in San Francisco, CA to explore cutting-edge tech and listen to tech luminaries present their vision of the future. This family event has something for everyone, including kids. Get more information and register today. http://sdm.link/attshape ___ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel -- Attend Shape: An AT Tech Expo July 15-16. Meet us at AT Park in San Francisco, CA to explore cutting-edge tech and listen to tech luminaries present their vision of the future. This family event has something for everyone, including kids. Get more information and register today. http://sdm.link/attshape___ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel
Re: The spell checker issue
Daniel Naberwrote: > Hi, > > yesterday I tried to update the English dictionary that LT includes. The > details are documented at > https://github.com/languagetool-org/languagetool/issues/329 but in a > nutshell: our spell checking is so complicated that the dictionary > update didn't work. > > We could really need a process that allows us to use hunspell > dictionaries directly, without conversion to other formats. The original > reason we don't use hunspell (or only parts of it) is that it's slow, > especially when it comes to generating suggestions. Today I ran a test > with hunspell 1.4.1 and LT, and it turns out LT is about 4-5 times > faster. > > What could be a solution: > > A) Improve hunspell to be faster. We'd need someone who can do this and > then we'd still rely on native code, which isn't what we want in Java > (but we've lived with it for years now). > > B) Finally write a Java-based spell checker that can read hunspell > dictionaries. The internet is full of spell checkers, but we need one > with support for advanced features like compound words (important for > German). > > C) I don't know, do you have an idea? > > If we cannot find a solution, the current situation will persist so that > some dictionaries probably won't be updated. If Hunspell is thread-safe (?), could we search for suggestions of multiple words in parallel in multiple threads? Dominique -- Attend Shape: An AT Tech Expo July 15-16. Meet us at AT Park in San Francisco, CA to explore cutting-edge tech and listen to tech luminaries present their vision of the future. This family event has something for everyone, including kids. Get more information and register today. http://sdm.link/attshape ___ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel
The spell checker issue
Hi, yesterday I tried to update the English dictionary that LT includes. The details are documented at https://github.com/languagetool-org/languagetool/issues/329 but in a nutshell: our spell checking is so complicated that the dictionary update didn't work. We could really need a process that allows us to use hunspell dictionaries directly, without conversion to other formats. The original reason we don't use hunspell (or only parts of it) is that it's slow, especially when it comes to generating suggestions. Today I ran a test with hunspell 1.4.1 and LT, and it turns out LT is about 4-5 times faster. What could be a solution: A) Improve hunspell to be faster. We'd need someone who can do this and then we'd still rely on native code, which isn't what we want in Java (but we've lived with it for years now). B) Finally write a Java-based spell checker that can read hunspell dictionaries. The internet is full of spell checkers, but we need one with support for advanced features like compound words (important for German). C) I don't know, do you have an idea? If we cannot find a solution, the current situation will persist so that some dictionaries probably won't be updated. Regards Daniel This is the text for testing, full of typos (supposed to be German): Fgen Siex hxier Ixhren Txext eiwen. Klcken ie nch dr Prüung aug diw fatbig unteelegten Textstellwn. oder notzen Sie desen Teyt alls Beeispiel füür eein Paat Fwhler , diw LanguageTool erkwnnen ksnn: Ih wirde Ankst und banke. -- Attend Shape: An AT Tech Expo July 15-16. Meet us at AT Park in San Francisco, CA to explore cutting-edge tech and listen to tech luminaries present their vision of the future. This family event has something for everyone, including kids. Get more information and register today. http://sdm.link/attshape ___ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel