2013/11/25 Daniel Naber <list2...@danielnaber.de>

> On 2013-11-25 11:11, Jaume Ortolà i Font wrote:
>
> > -  A method for building the dictionary, assuming that it will be
> > used only for some languages (backward compatible).
> > - A way of using the frequency information in the ordering of
> > suggestions. For example:
> > new distance = current distance *10 + a number between 0 and 9
> > (A-K).
>
> Sounds good to me. There are lists of word occurrences on the web, maybe
> we can use them if the license is okay. If not, the process of creating
> the list should be reproducible, i.e. it should rely on data that's
> freely available. This might not be so easy, just using Wikipedia
> doesn't seem appropriate because of its style.
>
>
Look at these wordlists [1]. They are Apache 2.0. The words are classified
in 256 ranges.

(The Catalan list is more or less OK. But the tokenization is not the same
as in LT. I can build a better one from other sources, but the corpus data
is not freely available.)

Regards,
Jaume Ortolà

[1] https://github.com/mozilla-b2g/gaia/tree/master/keyboard/dictionaries.
------------------------------------------------------------------------------
Shape the Mobile Experience: Free Subscription
Software experts and developers: Be at the forefront of tech innovation.
Intel(R) Software Adrenaline delivers strategic insight and game-changing 
conversations that shape the rapidly evolving mobile landscape. Sign up now. 
http://pubads.g.doubleclick.net/gampad/clk?id=63431311&iu=/4140/ostg.clktrk
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to