Jaume, thanks, but I am not sure.

Depends on its implementation I think.

Where can I find more info?

Ruud

Op 16-09-14 om 12:26 schreef Jaume Ortolà i Font:
2014-09-16 11:21 GMT+02:00 R.J. Baars <r.j.ba...@xs4all.nl <mailto:r.j.ba...@xs4all.nl>>:

    We don't agree. There is a spellchecker, but also a single word ignore
    list for it.
    There are XML rules, but also a Simplereplace rule, a compounding
    rule.

    So apart from the hammer and the screwdriver, there are more tools.


There is indeed another tool for multi-words. It seems that Ruud doesn't know it.

We can enable a HybridDisambiguator and add a MultiwordChunker to the disambiguation. With this you can write a list of "multi-words" with its corresponding tag in a plain text file (multiwords.txt).

I use the MultiwordChunker with two objectives: improve disambiguation and avoid spelling matches in multiwords.

Would it be useful for you, Ruud?

Regards,
Jaume



    But anyway, adding the most frequent ones tot the disambiguator works.

    Getting rid of wrong postags and 10% reported possible spelling
    errors on
    the entire corpus is a higher priority.
    And fixing false positives. Having almost doubled the amount or
    rules is
    enough for this month.

    Ruud



    > W dniu 2014-09-16 o 09:03, R.J. Baars pisze:
    >> A word like 'Aviv'is not correct unless 'Tel' is before it.
    >> So it is best to leave Tel and Aviv out of the spell checker.
    >> That results in spell checking reporting errors for Aviv.
    >>
    >> In the disambiguator, there is the option to block that, by
    making an
    >> immunizing rule:
    >>
    >>    <!-- Tel Aviv-->
    >>    <rule id="TEL_AVIV" name="Tel Aviv">
    >>      <pattern>
    >>        <token>Tel</token>
    >>        <token>Aviv</token>
    >>      </pattern>
    >>      <disambig action="ignore_spelling"/>
    >>    </rule>
    >>
    >> That works perfectly. But then, there are a lot of these word
    >> combinations. Wouldn't it be better to have a multi-word ignore
    list for
    >> the spell checker?
    >>
    >> (Or even a multi-word spell checker, not just knowing 'correct'
    and 'not
    >> in list', but 'correct', 'incorrect' and 'not in list')
    >
    > It would not be an enhancement, as this would not give new
    functionality
    > but cripple the existing one. Also, the ability to use all XML
    syntax is
    > extremely important to me (I use POS tags and regular
    expressions), so I
    > wouldn't make use of the multi-word spell checker anyway. So
    we'd have
    > to introduce a crippled syntax that would look a little bit
    different
    > for a human being but with no meaningful functional change. I don't
    > think it's worth our time.
    >
    > The spell checker is best for checking individual words. Just like a
    > hammer, it's good for nails, and not for screws. For screws, we
    have a
    > screwdriver. For multi-word entities, we have more refined
    tools, like
    > tagging and disambiguation and special attributes.
    >
    > Best,
    > Marcin
    >
    >
    
------------------------------------------------------------------------------
    > Want excitement?
    > Manually upgrade your production database.
    > When you want reliability, choose Perforce.
    > Perforce version control. Predictably reliable.
    >
    http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
    > _______________________________________________
    > Languagetool-devel mailing list
    > Languagetool-devel@lists.sourceforge.net
    <mailto:Languagetool-devel@lists.sourceforge.net>
    > https://lists.sourceforge.net/lists/listinfo/languagetool-devel
    >



    
------------------------------------------------------------------------------
    Want excitement?
    Manually upgrade your production database.
    When you want reliability, choose Perforce.
    Perforce version control. Predictably reliable.
    http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
    _______________________________________________
    Languagetool-devel mailing list
    Languagetool-devel@lists.sourceforge.net
    <mailto:Languagetool-devel@lists.sourceforge.net>
    https://lists.sourceforge.net/lists/listinfo/languagetool-devel




------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce.
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk


_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce.
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to