I have the same difficulty with the disambiguation rules in Catalan. The
proposed solution would be very useful.
Regards,
Jaume Ortolà
www.riuraueditors.cat
2012/5/28 Dominique Pellé <dominique.pe...@gmail.com>
> Hi
>
> For languages that use the POS disambiguator (fr, ca, el, en, eo, es,
> fr, gl, km, nl, pl, ro, ru),
> words may sometimes get the wrong POS because of an error in the
> disambiguation rules.
>
> I find it quite difficult to find which rule(s) caused a word to get a
> misclassified POS.
> My current strategy is trial an error: I try to comment out parts of
> the resources/*/disambiguation.xml
> file until I find that the rule(s) that caused the error. It's slow
> and cumbersome. I'm not sure
> how else other people debug disambiguation rules.
>
> I would find it useful to have a debug mode which prints which rule(s)
> was matched (if any)
> that altered the POS of the words.
>
> For example, LanguageTool gave a false positive in the following
> correct French sentence
> French sentence (I'll fix it soon): "Toutes nos félicitations"
> (litterally = All our congratulations)
>
> $ echo "Toutes nos félicitations." | java -jar
> ~/sb/languagetool/dist/LanguageTool.jar -l fr -v
> Expected text language: French
> Working on STDIN...
> 2048 rules activated for language French
> <S> Toutes[tous/R f p,tout/D f p] nos[no/N m p]
> félicitations[félicitation/N f p].[./M fin,</S>]<P/>
> 1.) Line 1, column 1, Rule ID: ACCORD_GENRE[3]
> Message: « Toutes » et « nos » ne semblent pas bien accordés en genre.
> Toutes nos félicitations.
> ^^^^^^^^^^
>
> The false positive happens because the word "nos" gets incorrectly tagged
> as a noun (N m p) instead of "D e p". By trial and error, I found that
> it's the
> French disambiguation rule <rule name="nom" id="N"> which misclassified
> the POS of the word "nos" here (causing the false positive).
>
> It would be easier, if in verbose mode (-v command line flag), LanguageTool
> displayed the disambiguation rule IDs that matched for each word So
> instead of just printing...
>
> nos[no/N m p]
>
> ... LanguageTool could print something like this:
>
> nos[no/N m p](nom)
>
> ... where "nom" is the disambiguation rule ID that altered the POS tag of
> the
> word "nos" in this example (multiple rule IDs could be shown if multiple
> rules are matched, in the order that they are matched, since order matters
> for disambiguation).
>
> Regards
> -- Dominique
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Languagetool-devel mailing list
> Languagetool-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel