Hi.

Finally I have a full keyborad, to elaborate a bit on the expansion issue.

Spell checking is supposed signal any incorrect word. So most correct 
words should be accepted.
There are words in between though. Words that are technically correct, 
but in everyday language use mocht commonly a mistake for a different word.

Example for Dutch: si is one of the notes in do-re-mi-fa-sol-la-si-do. 
So it is technically correct. But over 80% of the hits in Dutch 
sentences it is a mistake for is. So it has intentionally been left out 
of the correct words list, even though it is correct.

When compounding is uses, some compounding parts will accidentally 
combine into a word that is technically correct, but still most of the 
time a mistake. Example: a muskaatnoot (nutmeg) is correct, but also 
muskaatnood could easily be generated, since nood (emergency) is a 
compounder too.

No matter how carefully compounds have been selected, lots of nonsense 
words have been reported as Hunspell suggestions since the Hunspell 
dictionary for Dutch introduced compounding.

Because of that, it is not a good base material for expansion. The one 
being fabricated now, to be released the end of this year (hopefully, it 
is 1 year leate then) could be better base material for expansion.

Ruud




On 30-04-13 18:58, Marcin Miłkowski wrote:
> W dniu 2013-04-30 17:36, Daniel Naber pisze:
>> On 30.04.2013, 09:47:28 Marcin Miłkowski wrote:
>>
>>> Why? It's just for internal processing, not for maintaining the
>>> dictionary. What might possibly go wrong?
>> Maybe I didn't follow this discussion closely enough, but does that work
>> for compounds with more than two parts? At least for German, these are less
>> common than compounds made up of two words, but not uncommon enough to
>> ignore them.
> If hunspell specifies them as valid words, then they would be present.
>
> If it turns out that the generated file gets too large to handle, we
> will think of other ways of compounding without hunspell (maybe in a
> rule-based way).
>
> Hm, as I think of it, if there are conversions from hunspell to HFST for
> German or Dutch dictionaries, then if HFST allows to print all the words
> it contains, we should have a very simple way to convert the dictionary.
>
> Regards,
> Marcin
>
> ------------------------------------------------------------------------------
> Introducing AppDynamics Lite, a free troubleshooting tool for Java/.NET
> Get 100% visibility into your production application - at no cost.
> Code-level diagnostics for performance bottlenecks with <2% overhead
> Download for free and get started troubleshooting in minutes.
> http://p.sf.net/sfu/appdyn_d2d_ap1
> _______________________________________________
> Languagetool-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/languagetool-devel


------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite
It's a free troubleshooting tool designed for production
Get down to code-level detail for bottlenecks, with <2% overhead.
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap2
_______________________________________________
Languagetool-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to