Thanks, Jan, for supporting.

LT now appears to have 2 purposes for a words list: postagging and spell 
checking.
Maybe this could be combined into one, just by adding a flag to the 
words, with a error-probability value. Doing this, it would be possible 
to still 'expand' a hunspell dictionary, to creat the biggest possible 
words list for postagging, but keep the valuable spell checking info, 
with correctness levels like 'known error (100%)', 'probable error', 
'might be error', 'extra info'
The levels less then 100% could be accompanied by rules as well.

Ruud



On 03-05-13 23:14, Jan Schreiber wrote:
> The problem with the compounds in Hunspell that Ruud described exists
> for German as well. Just saying.
>
> Am 03.05.2013 13:07, schrieb Ruud Baars:
>> Hi.
>>
>> Finally I have a full keyborad, to elaborate a bit on the expansion issue.
>>
>> Spell checking is supposed signal any incorrect word. So most correct
>> words should be accepted.
>> There are words in between though. Words that are technically correct,
>> but in everyday language use mocht commonly a mistake for a different word.
>>
>> Example for Dutch: si is one of the notes in do-re-mi-fa-sol-la-si-do.
>> So it is technically correct. But over 80% of the hits in Dutch
>> sentences it is a mistake for is. So it has intentionally been left out
>> of the correct words list, even though it is correct.
>>
>> When compounding is uses, some compounding parts will accidentally
>> combine into a word that is technically correct, but still most of the
>> time a mistake. Example: a muskaatnoot (nutmeg) is correct, but also
>> muskaatnood could easily be generated, since nood (emergency) is a
>> compounder too.
>>
>> No matter how carefully compounds have been selected, lots of nonsense
>> words have been reported as Hunspell suggestions since the Hunspell
>> dictionary for Dutch introduced compounding.
>>
>> Because of that, it is not a good base material for expansion. The one
>> being fabricated now, to be released the end of this year (hopefully, it
>> is 1 year leate then) could be better base material for expansion.
>>
>> Ruud
> ------------------------------------------------------------------------------
> Get 100% visibility into Java/.NET code with AppDynamics Lite
> It's a free troubleshooting tool designed for production
> Get down to code-level detail for bottlenecks, with <2% overhead.
> Download for free and get started troubleshooting in minutes.
> http://p.sf.net/sfu/appdyn_d2d_ap2
> _______________________________________________
> Languagetool-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/languagetool-devel


------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite
It's a free troubleshooting tool designed for production
Get down to code-level detail for bottlenecks, with <2% overhead.
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap2
_______________________________________________
Languagetool-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to