Thanks Marcin, that will be very useful for debugging
disambiguation rules.

There is something which I do not understand though.
Take this example with the French sentence "Les avions"
(= The planes). Both words have 2 POS tags in the
French dictionary:

$ egrep "^(les|avions)\s" lexique-dicollecte-fr-v4.4.1.txt.LT.txt
les     le      D e p
les     les     R pers obj 3 p
avions  avion   N m p
avions  avoir   V avoir ind impa 1 p

When I run LT in verbose mode, it gives:

=====================
$ echo "Les avions" |  java -jar dist/LanguageTool.jar -l fr -v
Expected text language: French
Working on STDIN...
2049 rules activated for language French
<S> Les[le/D e p]  avions[avion/N m p,</S>]<P/>
Disambiguator log:

RB-LE_LA_LES: Les[le/D e p] -> Les[le/D e p]

RP-D_N_AMBIG: avions[avoir/V avoir ind impa 1 p,avion/N m
p,avoir/SENT_END] -> avions[avion/N m p,avion/SENT_END]
=====================

Now, shouldn't I see in the log that the word
"les" had 2 POS before being disambiguated?

Log does not show that the word "les" had POS
"R pers obj 3 p" before disambiguation.

Regards
-- Dominique

On Mon, Jun 11, 2012 at 7:11 PM, Marcin Miłkowski <list-addr...@wp.pl> wrote:
> OK, I implemented this today. Note: it works only for the rule-based
> disambiguator, any other disambiguators need to add the annotation on
> their own.
>
> Regards,
> Marcin
>
> W dniu 2012-05-28 14:08, Marcin Miłkowski pisze:
>> That requires some additions to the AnalyzedToken and multiple other
>> places. But agreed, very useful. Will think of it.
>>
>> 28-05-2012 13:24 użytkownik "Jaume Ortolà i Font" <jaumeort...@gmail.com
>> <mailto:jaumeort...@gmail.com>> napisał:
>>
>>     I have the same difficulty with the disambiguation rules in Catalan.
>>     The proposed solution would be very useful.
>>
>>     Regards,
>>     Jaume Ortolà
>>     www.riuraueditors.cat <http://www.riuraueditors.cat>
>>
>>
>>
>>
>>     2012/5/28 Dominique Pellé <dominique.pe...@gmail.com
>>     <mailto:dominique.pe...@gmail.com>>
>>
>>         Hi
>>
>>         For languages that use the POS disambiguator (fr, ca, el, en,
>>         eo, es,
>>         fr, gl, km, nl, pl, ro, ru),
>>         words may sometimes get the wrong POS because of an error in the
>>         disambiguation rules.
>>
>>         I find it quite difficult to find which rule(s) caused a word to
>>         get a
>>         misclassified POS.
>>         My current strategy is trial an error: I try to comment out parts of
>>         the resources/*/disambiguation.xml
>>         file until I find that the rule(s) that caused the error.  It's slow
>>         and cumbersome.  I'm not sure
>>         how else other people debug disambiguation rules.
>>
>>         I would find it useful to have a debug mode which prints which
>>         rule(s)
>>         was matched (if any)
>>         that altered the POS of the words.
>>
>>         For example, LanguageTool gave a false positive in the following
>>         correct French sentence
>>         French sentence (I'll fix it soon): "Toutes nos félicitations"
>>         (litterally = All our congratulations)
>>
>>         $ echo "Toutes nos félicitations." | java -jar
>>         ~/sb/languagetool/dist/LanguageTool.jar -l fr -v
>>         Expected text language: French
>>         Working on STDIN...
>>         2048 rules activated for language French
>>         <S> Toutes[tous/R f p,tout/D f p]  nos[no/N m p]
>>         félicitations[félicitation/N f p].[./M fin,</S>]<P/>
>>         1.) Line 1, column 1, Rule ID: ACCORD_GENRE[3]
>>         Message: « Toutes » et « nos » ne semblent pas bien accordés en
>>         genre.
>>         Toutes nos félicitations.
>>         ^^^^^^^^^^
>>
>>         The false positive happens because the word "nos" gets
>>         incorrectly tagged
>>         as a noun (N m p) instead of "D e p".  By trial and error, I
>>         found that it's the
>>         French disambiguation rule <rule name="nom" id="N"> which
>>         misclassified
>>         the POS of the word "nos" here (causing the false positive).
>>
>>         It would be easier, if in verbose mode (-v command line flag),
>>         LanguageTool
>>         displayed the disambiguation rule IDs that matched for each word So
>>         instead of just printing...
>>
>>         nos[no/N m p]
>>
>>         ... LanguageTool could print something like this:
>>
>>         nos[no/N m p](nom)
>>
>>         ... where "nom" is the disambiguation rule ID that altered the
>>         POS tag of the
>>         word "nos" in this example (multiple rule IDs could be shown if
>>         multiple
>>         rules are matched, in the order that they are matched, since
>>         order matters
>>         for disambiguation).
>>
>>         Regards
>>         -- Dominique
>>
>>         
>> ------------------------------------------------------------------------------
>>         Live Security Virtual Conference
>>         Exclusive live event will cover all the ways today's security and
>>         threat landscape has changed and how IT managers can respond.
>>         Discussions
>>         will include endpoint security, mobile security and the latest
>>         in malware
>>         threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>>         _______________________________________________
>>         Languagetool-devel mailing list
>>         Languagetool-devel@lists.sourceforge.net
>>         <mailto:Languagetool-devel@lists.sourceforge.net>
>>         https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>>
>>
>>
>>     
>> ------------------------------------------------------------------------------
>>     Live Security Virtual Conference
>>     Exclusive live event will cover all the ways today's security and
>>     threat landscape has changed and how IT managers can respond.
>>     Discussions
>>     will include endpoint security, mobile security and the latest in
>>     malware
>>     threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>>     _______________________________________________
>>     Languagetool-devel mailing list
>>     Languagetool-devel@lists.sourceforge.net
>>     <mailto:Languagetool-devel@lists.sourceforge.net>
>>     https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>>
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Languagetool-devel mailing list
> Languagetool-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/languagetool-devel

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to