2014-09-16 14:43 GMT+02:00 R.Baars <[email protected]>:

>  How is that done?
>
> Ruud
>
>
Do you mean ignoring tagged words in spellchecking (even if they are not in
the dictionary)? It's a configurable option of the speller (at least in the
Morfologik speller rule). A line of Java code.

Jaume





>
> Op 16-09-14 om 13:23 schreef Jaume Ortolà i Font:
>
>   2014-09-16 13:03 GMT+02:00 R.Baars <[email protected]>:
>
>>  I see. This is probably of no use for spellchecking, but it is for
>> postagging.
>>
>>
>  It gives no suggestions, but it can be used for avoiding false positives
> in spellchecking, if you set that tagged words are to be ignored.
>
>
>>
>> Does
>> Abu Dhabi NPCNG00
>> cause both words to be tagged with that tag, or are they considered 1
>> token with that postag?
>>
>>
>  Tokenization is not changed. In this case:
>
>  <token postag="<NPCNG00>">Abu</token>
> <token postag="</NPCNG00>">Dhabi</token>
>
>  if there are more than two tokens, the inside tokens are not tagged.
> Perhaps this should be optionally changed (ie, tag the inside tokens too).
>
>  Regards,
> Jaume
>
>
>
>
>
>>  (Might come in handy for just this tagging..)
>>
>> Ruud
>>
>> Op 16-09-14 om 12:56 schreef Jaume Ortolà i Font:
>>
>>  Hi, Ruud.
>>
>>  I don't find any documentation. It is used in Polish, French, Catalan,
>> Russian, Ukrainian and Spanish.
>>
>>  Implementation:
>>
>>  Enable it (Java).
>> Create a "multiwords.txt" in your resources folder like these [1]. The
>> tokens are separated by white space and the tag is separated by a tab.
>>
>>  Result:
>>
>>  The first token of the multiword is tagged with "<POSTAG>" and the last
>> token is tagged with "</POSTAG>".
>>
>>  The MultiwordChunker is case-insensitive. I would like to make it
>> configurable, specially for first letter uppercase.
>>
>>  Regards,
>> Jaume
>>
>>
>>  [1]
>> https://github.com/languagetool-org/languagetool/blob/master/languagetool-language-modules/pl/src/main/resources/org/languagetool/resource/pl/multiwords.txt
>>
>>
>> https://github.com/languagetool-org/languagetool/blob/master/languagetool-language-modules/ca/src/main/resources/org/languagetool/resource/ca/multiwords.txt
>>
>> 2014-09-16 12:33 GMT+02:00 R.Baars <[email protected]>:
>>
>>>  Jaume, thanks, but I am not sure.
>>>
>>> Depends on its implementation I think.
>>>
>>> Where can I find more info?
>>>
>>> Ruud
>>>
>>> Op 16-09-14 om 12:26 schreef Jaume Ortolà i Font:
>>>
>>>   2014-09-16 11:21 GMT+02:00 R.J. Baars <[email protected]>:
>>>
>>>> We don't agree. There is a spellchecker, but also a single word ignore
>>>> list for it.
>>>> There are XML rules, but also a Simplereplace rule, a compounding rule.
>>>>
>>>> So apart from the hammer and the screwdriver, there are more tools.
>>>>
>>>>
>>>  There is indeed another tool for multi-words. It seems that Ruud
>>> doesn't know it.
>>>
>>>  We can enable a HybridDisambiguator and add a MultiwordChunker to the
>>> disambiguation. With this you can write a list of "multi-words" with its
>>> corresponding tag in a plain text file (multiwords.txt).
>>>
>>>  I use the MultiwordChunker with two objectives: improve disambiguation
>>> and avoid spelling matches in multiwords.
>>>
>>>  Would it be useful for you, Ruud?
>>>
>>>  Regards,
>>> Jaume
>>>
>>>
>>>
>>>
>>>
>>>> But anyway, adding the most frequent ones tot the disambiguator works.
>>>>
>>>> Getting rid of wrong postags and 10% reported possible spelling errors
>>>> on
>>>> the entire corpus is a higher priority.
>>>> And fixing false positives. Having almost doubled the amount or rules is
>>>> enough for this month.
>>>>
>>>> Ruud
>>>>
>>>>
>>>>
>>>> > W dniu 2014-09-16 o 09:03, R.J. Baars pisze:
>>>> >> A word like 'Aviv'is not correct unless 'Tel' is before it.
>>>> >> So it is best to leave Tel and Aviv out of the spell checker.
>>>> >> That results in spell checking reporting errors for Aviv.
>>>> >>
>>>> >> In the disambiguator, there is the option to block that, by making an
>>>> >> immunizing rule:
>>>> >>
>>>> >>    <!-- Tel Aviv-->
>>>> >>    <rule id="TEL_AVIV" name="Tel Aviv">
>>>> >>      <pattern>
>>>> >>        <token>Tel</token>
>>>> >>        <token>Aviv</token>
>>>> >>      </pattern>
>>>> >>      <disambig action="ignore_spelling"/>
>>>> >>    </rule>
>>>> >>
>>>> >> That works perfectly. But then, there are a lot of these word
>>>> >> combinations. Wouldn't it be better to have a multi-word ignore list
>>>> for
>>>> >> the spell checker?
>>>> >>
>>>> >> (Or even a multi-word spell checker, not just knowing 'correct' and
>>>> 'not
>>>> >> in list', but 'correct', 'incorrect' and 'not in list')
>>>> >
>>>> > It would not be an enhancement, as this would not give new
>>>> functionality
>>>> > but cripple the existing one. Also, the ability to use all XML syntax
>>>> is
>>>> > extremely important to me (I use POS tags and regular expressions),
>>>> so I
>>>> > wouldn't make use of the multi-word spell checker anyway. So we'd have
>>>> > to introduce a crippled syntax that would look a little bit different
>>>> > for a human being but with no meaningful functional change. I don't
>>>> > think it's worth our time.
>>>> >
>>>> > The spell checker is best for checking individual words. Just like a
>>>> > hammer, it's good for nails, and not for screws. For screws, we have a
>>>> > screwdriver. For multi-word entities, we have more refined tools, like
>>>> > tagging and disambiguation and special attributes.
>>>> >
>>>> > Best,
>>>> > Marcin
>>>> >
>>>> >
>>>> ------------------------------------------------------------------------------
>>>> > Want excitement?
>>>> > Manually upgrade your production database.
>>>> > When you want reliability, choose Perforce.
>>>> > Perforce version control. Predictably reliable.
>>>> >
>>>> http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
>>>> > _______________________________________________
>>>> > Languagetool-devel mailing list
>>>> > [email protected]
>>>> > https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>>>> >
>>>>
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Want excitement?
>>>> Manually upgrade your production database.
>>>> When you want reliability, choose Perforce.
>>>> Perforce version control. Predictably reliable.
>>>>
>>>> http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
>>>> _______________________________________________
>>>> Languagetool-devel mailing list
>>>> [email protected]
>>>> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>>>>
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Want excitement?
>>> Manually upgrade your production database.
>>> When you want reliability, choose Perforce.
>>> Perforce version control. Predictably 
>>> reliable.http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
>>>
>>>
>>>
>>> _______________________________________________
>>> Languagetool-devel mailing 
>>> [email protected]https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>>>
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Want excitement?
>>> Manually upgrade your production database.
>>> When you want reliability, choose Perforce.
>>> Perforce version control. Predictably reliable.
>>>
>>> http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
>>> _______________________________________________
>>> Languagetool-devel mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>>>
>>>
>>
>>
>> ------------------------------------------------------------------------------
>> Want excitement?
>> Manually upgrade your production database.
>> When you want reliability, choose Perforce.
>> Perforce version control. Predictably 
>> reliable.http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
>>
>>
>>
>> _______________________________________________
>> Languagetool-devel mailing 
>> [email protected]https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Want excitement?
>> Manually upgrade your production database.
>> When you want reliability, choose Perforce.
>> Perforce version control. Predictably reliable.
>>
>> http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
>> _______________________________________________
>> Languagetool-devel mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>>
>>
>
>
> ------------------------------------------------------------------------------
> Want excitement?
> Manually upgrade your production database.
> When you want reliability, choose Perforce.
> Perforce version control. Predictably 
> reliable.http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
>
>
>
> _______________________________________________
> Languagetool-devel mailing 
> [email protected]https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>
>
>
>
> ------------------------------------------------------------------------------
> Want excitement?
> Manually upgrade your production database.
> When you want reliability, choose Perforce.
> Perforce version control. Predictably reliable.
>
> http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
> _______________________________________________
> Languagetool-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>
>
------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce.
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
_______________________________________________
Languagetool-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to