Thanks, Jan, for supporting. LT now appears to have 2 purposes for a words list: postagging and spell checking. Maybe this could be combined into one, just by adding a flag to the words, with a error-probability value. Doing this, it would be possible to still 'expand' a hunspell dictionary, to creat the biggest possible words list for postagging, but keep the valuable spell checking info, with correctness levels like 'known error (100%)', 'probable error', 'might be error', 'extra info' The levels less then 100% could be accompanied by rules as well.
Ruud On 03-05-13 23:14, Jan Schreiber wrote: > The problem with the compounds in Hunspell that Ruud described exists > for German as well. Just saying. > > Am 03.05.2013 13:07, schrieb Ruud Baars: >> Hi. >> >> Finally I have a full keyborad, to elaborate a bit on the expansion issue. >> >> Spell checking is supposed signal any incorrect word. So most correct >> words should be accepted. >> There are words in between though. Words that are technically correct, >> but in everyday language use mocht commonly a mistake for a different word. >> >> Example for Dutch: si is one of the notes in do-re-mi-fa-sol-la-si-do. >> So it is technically correct. But over 80% of the hits in Dutch >> sentences it is a mistake for is. So it has intentionally been left out >> of the correct words list, even though it is correct. >> >> When compounding is uses, some compounding parts will accidentally >> combine into a word that is technically correct, but still most of the >> time a mistake. Example: a muskaatnoot (nutmeg) is correct, but also >> muskaatnood could easily be generated, since nood (emergency) is a >> compounder too. >> >> No matter how carefully compounds have been selected, lots of nonsense >> words have been reported as Hunspell suggestions since the Hunspell >> dictionary for Dutch introduced compounding. >> >> Because of that, it is not a good base material for expansion. The one >> being fabricated now, to be released the end of this year (hopefully, it >> is 1 year leate then) could be better base material for expansion. >> >> Ruud > ------------------------------------------------------------------------------ > Get 100% visibility into Java/.NET code with AppDynamics Lite > It's a free troubleshooting tool designed for production > Get down to code-level detail for bottlenecks, with <2% overhead. > Download for free and get started troubleshooting in minutes. > http://p.sf.net/sfu/appdyn_d2d_ap2 > _______________________________________________ > Languagetool-devel mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/languagetool-devel ------------------------------------------------------------------------------ Get 100% visibility into Java/.NET code with AppDynamics Lite It's a free troubleshooting tool designed for production Get down to code-level detail for bottlenecks, with <2% overhead. Download for free and get started troubleshooting in minutes. http://p.sf.net/sfu/appdyn_d2d_ap2 _______________________________________________ Languagetool-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/languagetool-devel
