How is that done?
Ruud
Op 16-09-14 om 13:23 schreef Jaume Ortolà i Font:
2014-09-16 13:03 GMT+02:00 R.Baars <baar...@xs4all.nl
<mailto:baar...@xs4all.nl>>:
I see. This is probably of no use for spellchecking, but it is for
postagging.
It gives no suggestions, but it can be used for avoiding false
positives in spellchecking, if you set that tagged words are to be
ignored.
Does
Abu Dhabi NPCNG00
cause both words to be tagged with that tag, or are they
considered 1 token with that postag?
Tokenization is not changed. In this case:
<token postag="<NPCNG00>">Abu</token>
<token postag="</NPCNG00>">Dhabi</token>
if there are more than two tokens, the inside tokens are not tagged.
Perhaps this should be optionally changed (ie, tag the inside tokens too).
Regards,
Jaume
(Might come in handy for just this tagging..)
Ruud
Op 16-09-14 om 12:56 schreef Jaume Ortolà i Font:
Hi, Ruud.
I don't find any documentation. It is used in Polish, French,
Catalan, Russian, Ukrainian and Spanish.
Implementation:
Enable it (Java).
Create a "multiwords.txt" in your resources folder like these
[1]. The tokens are separated by white space and the tag is
separated by a tab.
Result:
The first token of the multiword is tagged with "<POSTAG>" and
the last token is tagged with "</POSTAG>".
The MultiwordChunker is case-insensitive. I would like to make it
configurable, specially for first letter uppercase.
Regards,
Jaume
[1]
https://github.com/languagetool-org/languagetool/blob/master/languagetool-language-modules/pl/src/main/resources/org/languagetool/resource/pl/multiwords.txt
https://github.com/languagetool-org/languagetool/blob/master/languagetool-language-modules/ca/src/main/resources/org/languagetool/resource/ca/multiwords.txt
2014-09-16 12:33 GMT+02:00 R.Baars <baar...@xs4all.nl
<mailto:baar...@xs4all.nl>>:
Jaume, thanks, but I am not sure.
Depends on its implementation I think.
Where can I find more info?
Ruud
Op 16-09-14 om 12:26 schreef Jaume Ortolà i Font:
2014-09-16 11:21 GMT+02:00 R.J. Baars <r.j.ba...@xs4all.nl
<mailto:r.j.ba...@xs4all.nl>>:
We don't agree. There is a spellchecker, but also a
single word ignore
list for it.
There are XML rules, but also a Simplereplace rule, a
compounding rule.
So apart from the hammer and the screwdriver, there are
more tools.
There is indeed another tool for multi-words. It seems that
Ruud doesn't know it.
We can enable a HybridDisambiguator and add a
MultiwordChunker to the disambiguation. With this you can
write a list of "multi-words" with its corresponding tag in
a plain text file (multiwords.txt).
I use the MultiwordChunker with two objectives: improve
disambiguation and avoid spelling matches in multiwords.
Would it be useful for you, Ruud?
Regards,
Jaume
But anyway, adding the most frequent ones tot the
disambiguator works.
Getting rid of wrong postags and 10% reported possible
spelling errors on
the entire corpus is a higher priority.
And fixing false positives. Having almost doubled the
amount or rules is
enough for this month.
Ruud
> W dniu 2014-09-16 o 09:03, R.J. Baars pisze:
>> A word like 'Aviv'is not correct unless 'Tel' is
before it.
>> So it is best to leave Tel and Aviv out of the spell
checker.
>> That results in spell checking reporting errors for Aviv.
>>
>> In the disambiguator, there is the option to block
that, by making an
>> immunizing rule:
>>
>> <!-- Tel Aviv-->
>> <rule id="TEL_AVIV" name="Tel Aviv">
>> <pattern>
>> <token>Tel</token>
>> <token>Aviv</token>
>> </pattern>
>> <disambig action="ignore_spelling"/>
>> </rule>
>>
>> That works perfectly. But then, there are a lot of
these word
>> combinations. Wouldn't it be better to have a
multi-word ignore list for
>> the spell checker?
>>
>> (Or even a multi-word spell checker, not just knowing
'correct' and 'not
>> in list', but 'correct', 'incorrect' and 'not in list')
>
> It would not be an enhancement, as this would not give
new functionality
> but cripple the existing one. Also, the ability to use
all XML syntax is
> extremely important to me (I use POS tags and regular
expressions), so I
> wouldn't make use of the multi-word spell checker
anyway. So we'd have
> to introduce a crippled syntax that would look a
little bit different
> for a human being but with no meaningful functional
change. I don't
> think it's worth our time.
>
> The spell checker is best for checking individual
words. Just like a
> hammer, it's good for nails, and not for screws. For
screws, we have a
> screwdriver. For multi-word entities, we have more
refined tools, like
> tagging and disambiguation and special attributes.
>
> Best,
> Marcin
>
>
------------------------------------------------------------------------------
> Want excitement?
> Manually upgrade your production database.
> When you want reliability, choose Perforce.
> Perforce version control. Predictably reliable.
>
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
> _______________________________________________
> Languagetool-devel mailing list
> Languagetool-devel@lists.sourceforge.net
<mailto:Languagetool-devel@lists.sourceforge.net>
>
https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>
------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce.
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
<mailto:Languagetool-devel@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/languagetool-devel
------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce.
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
<mailto:Languagetool-devel@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/languagetool-devel
------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce.
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
<mailto:Languagetool-devel@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/languagetool-devel
------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce.
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
<mailto:Languagetool-devel@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/languagetool-devel
------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce.
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
<mailto:Languagetool-devel@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/languagetool-devel
------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce.
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel
------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce.
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel