Some time ago I started a project that would use Hunspell with Java (I think at 
the time I used that library that uses BridJ).
It worked well, but I also needed to have custom dictionaries, and another 
interesting feature would be merge dictionaries.
Never had time to continue investigating it, but I remember it wasn't 
straightforward add new words, and for some cases I'd have to use the Hunspell 
syntax to tell it about the possible variations of the word.
Having something in Java would definitely be helpful. So if someone starts a 
GitHub repo, and have the skills to either port the existing Hunspell code (not 
sure about license issues) or write one from scratch based on papers, I'd be 
keen to take a look, help testing, and maybe even send some pull requests :)
That'd be great not just LT, but also for several other OSS projects. At the 
moment, when I need a simple dictionary and don't need all features of 
Hunspell, I prefer to use jazzy.
Cheers,Bruno
 
      From: Daniel Naber <daniel.na...@languagetool.org>
 To: LanguageTool Developer List <languagetool-devel@lists.sourceforge.net> 
 Sent: Wednesday, 29 June 2016 9:28 PM
 Subject: The spell checker issue
   
Hi,

yesterday I tried to update the English dictionary that LT includes. The 
details are documented at 
https://github.com/languagetool-org/languagetool/issues/329 but in a 
nutshell: our spell checking is so complicated that the dictionary 
update didn't work.

We could really need a process that allows us to use hunspell 
dictionaries directly, without conversion to other formats. The original 
reason we don't use hunspell (or only parts of it) is that it's slow, 
especially when it comes to generating suggestions. Today I ran a test 
with hunspell 1.4.1 and LT, and it turns out LT is about 4-5 times 
faster.

What could be a solution:

A) Improve hunspell to be faster. We'd need someone who can do this and 
then we'd still rely on native code, which isn't what we want in Java 
(but we've lived with it for years now).

B) Finally write a Java-based spell checker that can read hunspell 
dictionaries. The internet is full of spell checkers, but we need one 
with support for advanced features like compound words (important for 
German).

C) I don't know, do you have an idea?

If we cannot find a solution, the current situation will persist so that 
some dictionaries probably won't be updated.

Regards
  Daniel

This is the text for testing, full of typos (supposed to be German):
Fgen Siex hxier Ixhren Txext eiwen. Klcken ie nch dr Prüung aug diw 
fatbig
unteelegten Textstellwn. oder notzen Sie desen Teyt alls Beeispiel füür 
eein
Paat Fwhler , diw LanguageTool erkwnnen ksnn: Ih wirde Ankst und banke.


------------------------------------------------------------------------------
Attend Shape: An AT&T Tech Expo July 15-16. Meet us at AT&T Park in San
Francisco, CA to explore cutting-edge tech and listen to tech luminaries
present their vision of the future. This family event has something for
everyone, including kids. Get more information and register today.
http://sdm.link/attshape
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


   
------------------------------------------------------------------------------
Attend Shape: An AT&T Tech Expo July 15-16. Meet us at AT&T Park in San
Francisco, CA to explore cutting-edge tech and listen to tech luminaries
present their vision of the future. This family event has something for
everyone, including kids. Get more information and register today.
http://sdm.link/attshape
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to