Hi

For languages that use the POS disambiguator (fr, ca, el, en, eo, es,
fr, gl, km, nl, pl, ro, ru),
words may sometimes get the wrong POS because of an error in the
disambiguation rules.

I find it quite difficult to find which rule(s) caused a word to get a
misclassified POS.
My current strategy is trial an error: I try to comment out parts of
the resources/*/disambiguation.xml
file until I find that the rule(s) that caused the error.  It's slow
and cumbersome.  I'm not sure
how else other people debug disambiguation rules.

I would find it useful to have a debug mode which prints which rule(s)
was matched (if any)
that altered the POS of the words.

For example, LanguageTool gave a false positive in the following
correct French sentence
French sentence (I'll fix it soon):  "Toutes nos félicitations"
(litterally = All our congratulations)

$ echo "Toutes nos félicitations." | java -jar
~/sb/languagetool/dist/LanguageTool.jar -l fr -v
Expected text language: French
Working on STDIN...
2048 rules activated for language French
<S> Toutes[tous/R f p,tout/D f p]  nos[no/N m p]
félicitations[félicitation/N f p].[./M fin,</S>]<P/>
1.) Line 1, column 1, Rule ID: ACCORD_GENRE[3]
Message: « Toutes » et « nos » ne semblent pas bien accordés en genre.
Toutes nos félicitations.
^^^^^^^^^^

The false positive happens because the word "nos" gets incorrectly tagged
as a noun (N m p) instead of "D e p".  By trial and error, I found that it's the
French disambiguation rule <rule name="nom" id="N"> which misclassified
the POS of the word "nos" here (causing the false positive).

It would be easier, if in verbose mode (-v command line flag), LanguageTool
displayed the disambiguation rule IDs that matched for each word So
instead of just printing...

nos[no/N m p]

... LanguageTool could print something like this:

nos[no/N m p](nom)

... where "nom" is the disambiguation rule ID that altered the POS tag of the
word "nos" in this example (multiple rule IDs could be shown if multiple
rules are matched, in the order that they are matched, since order matters
for disambiguation).

Regards
-- Dominique

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to