words frequencies

R.J. Baars Mon, 13 Oct 2014 23:20:08 -0700

I am currently exporting word frequencies for all languages I have
collected over the years.


These frequency lists are 'dirty', which means there has been done no
check if words are correct.
That will be handled by the the speller anyway. Spell checker maintainers
could also use it for input..

The counting has also been done case and accent-specific. So Maxima,
maxima and Máxima (Uppercased plural of 'the most', 'the most', and the
name of our queen are counted separately.

The data is completely utf8 encoded.

The gaia header is a little bit off; I had no description for the language
at hand, and ther eis no real use for a version.

But there should be no problem using the data to add frequency classes to
Morfologikspeller for LT.

If you can spare a bit of computer time on a workstation, you could help
collecting this kind of data by running a tiny java application:

data.taaltik.nl/tool/TaalTik.jar




Is there enyone that could help me with figuring out an appropriate license?
(I don know anything about those..)

Ruud



------------------------------------------------------------------------------
Comprehensive Server Monitoring with Site24x7.
Monitor 10 servers for $9/Month.
Get alerted through email, SMS, voice calls or mobile push notifications.
Take corrective actions from your mobile device.
http://p.sf.net/sfu/Zoho
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

words frequencies

Reply via email to