2014-09-16 11:21 GMT+02:00 R.J. Baars <r.j.ba...@xs4all.nl>:

> We don't agree. There is a spellchecker, but also a single word ignore
> list for it.
> There are XML rules, but also a Simplereplace rule, a compounding rule.
>
> So apart from the hammer and the screwdriver, there are more tools.
>
>
There is indeed another tool for multi-words. It seems that Ruud doesn't
know it.

We can enable a HybridDisambiguator and add a MultiwordChunker to the
disambiguation. With this you can write a list of "multi-words" with its
corresponding tag in a plain text file (multiwords.txt).

I use the MultiwordChunker with two objectives: improve disambiguation and
avoid spelling matches in multiwords.

Would it be useful for you, Ruud?

Regards,
Jaume





> But anyway, adding the most frequent ones tot the disambiguator works.
>
> Getting rid of wrong postags and 10% reported possible spelling errors on
> the entire corpus is a higher priority.
> And fixing false positives. Having almost doubled the amount or rules is
> enough for this month.
>
> Ruud
>
>
>
> > W dniu 2014-09-16 o 09:03, R.J. Baars pisze:
> >> A word like 'Aviv'is not correct unless 'Tel' is before it.
> >> So it is best to leave Tel and Aviv out of the spell checker.
> >> That results in spell checking reporting errors for Aviv.
> >>
> >> In the disambiguator, there is the option to block that, by making an
> >> immunizing rule:
> >>
> >>    <!-- Tel Aviv-->
> >>    <rule id="TEL_AVIV" name="Tel Aviv">
> >>      <pattern>
> >>        <token>Tel</token>
> >>        <token>Aviv</token>
> >>      </pattern>
> >>      <disambig action="ignore_spelling"/>
> >>    </rule>
> >>
> >> That works perfectly. But then, there are a lot of these word
> >> combinations. Wouldn't it be better to have a multi-word ignore list for
> >> the spell checker?
> >>
> >> (Or even a multi-word spell checker, not just knowing 'correct' and 'not
> >> in list', but 'correct', 'incorrect' and 'not in list')
> >
> > It would not be an enhancement, as this would not give new functionality
> > but cripple the existing one. Also, the ability to use all XML syntax is
> > extremely important to me (I use POS tags and regular expressions), so I
> > wouldn't make use of the multi-word spell checker anyway. So we'd have
> > to introduce a crippled syntax that would look a little bit different
> > for a human being but with no meaningful functional change. I don't
> > think it's worth our time.
> >
> > The spell checker is best for checking individual words. Just like a
> > hammer, it's good for nails, and not for screws. For screws, we have a
> > screwdriver. For multi-word entities, we have more refined tools, like
> > tagging and disambiguation and special attributes.
> >
> > Best,
> > Marcin
> >
> >
> ------------------------------------------------------------------------------
> > Want excitement?
> > Manually upgrade your production database.
> > When you want reliability, choose Perforce.
> > Perforce version control. Predictably reliable.
> >
> http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
> > _______________________________________________
> > Languagetool-devel mailing list
> > Languagetool-devel@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/languagetool-devel
> >
>
>
>
>
> ------------------------------------------------------------------------------
> Want excitement?
> Manually upgrade your production database.
> When you want reliability, choose Perforce.
> Perforce version control. Predictably reliable.
>
> http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
> _______________________________________________
> Languagetool-devel mailing list
> Languagetool-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>
------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce.
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to