Ruud, I want to tag much more words than I want to admit as spelled
correctly. It would be an administrative nightmare to join these lists,
even more for English with its variants...
Marcin
04-05-2013 09:47 użytkownik "Ruud Baars" <[email protected]> napisał:
> Thanks, Jan, for supporting.
>
> LT now appears to have 2 purposes for a words list: postagging and spell
> checking.
> Maybe this could be combined into one, just by adding a flag to the
> words, with a error-probability value. Doing this, it would be possible
> to still 'expand' a hunspell dictionary, to creat the biggest possible
> words list for postagging, but keep the valuable spell checking info,
> with correctness levels like 'known error (100%)', 'probable error',
> 'might be error', 'extra info'
> The levels less then 100% could be accompanied by rules as well.
>
> Ruud
>
>
>
> On 03-05-13 23:14, Jan Schreiber wrote:
> > The problem with the compounds in Hunspell that Ruud described exists
> > for German as well. Just saying.
> >
> > Am 03.05.2013 13:07, schrieb Ruud Baars:
> >> Hi.
> >>
> >> Finally I have a full keyborad, to elaborate a bit on the expansion
> issue.
> >>
> >> Spell checking is supposed signal any incorrect word. So most correct
> >> words should be accepted.
> >> There are words in between though. Words that are technically correct,
> >> but in everyday language use mocht commonly a mistake for a different
> word.
> >>
> >> Example for Dutch: si is one of the notes in do-re-mi-fa-sol-la-si-do.
> >> So it is technically correct. But over 80% of the hits in Dutch
> >> sentences it is a mistake for is. So it has intentionally been left out
> >> of the correct words list, even though it is correct.
> >>
> >> When compounding is uses, some compounding parts will accidentally
> >> combine into a word that is technically correct, but still most of the
> >> time a mistake. Example: a muskaatnoot (nutmeg) is correct, but also
> >> muskaatnood could easily be generated, since nood (emergency) is a
> >> compounder too.
> >>
> >> No matter how carefully compounds have been selected, lots of nonsense
> >> words have been reported as Hunspell suggestions since the Hunspell
> >> dictionary for Dutch introduced compounding.
> >>
> >> Because of that, it is not a good base material for expansion. The one
> >> being fabricated now, to be released the end of this year (hopefully, it
> >> is 1 year leate then) could be better base material for expansion.
> >>
> >> Ruud
> >
> ------------------------------------------------------------------------------
> > Get 100% visibility into Java/.NET code with AppDynamics Lite
> > It's a free troubleshooting tool designed for production
> > Get down to code-level detail for bottlenecks, with <2% overhead.
> > Download for free and get started troubleshooting in minutes.
> > http://p.sf.net/sfu/appdyn_d2d_ap2
> > _______________________________________________
> > Languagetool-devel mailing list
> > [email protected]
> > https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>
>
>
> ------------------------------------------------------------------------------
> Get 100% visibility into Java/.NET code with AppDynamics Lite
> It's a free troubleshooting tool designed for production
> Get down to code-level detail for bottlenecks, with <2% overhead.
> Download for free and get started troubleshooting in minutes.
> http://p.sf.net/sfu/appdyn_d2d_ap2
> _______________________________________________
> Languagetool-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>
>
------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite
It's a free troubleshooting tool designed for production
Get down to code-level detail for bottlenecks, with <2% overhead.
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap2
_______________________________________________
Languagetool-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/languagetool-devel