Hi For languages that use the POS disambiguator (fr, ca, el, en, eo, es, fr, gl, km, nl, pl, ro, ru), words may sometimes get the wrong POS because of an error in the disambiguation rules.
I find it quite difficult to find which rule(s) caused a word to get a misclassified POS. My current strategy is trial an error: I try to comment out parts of the resources/*/disambiguation.xml file until I find that the rule(s) that caused the error. It's slow and cumbersome. I'm not sure how else other people debug disambiguation rules. I would find it useful to have a debug mode which prints which rule(s) was matched (if any) that altered the POS of the words. For example, LanguageTool gave a false positive in the following correct French sentence French sentence (I'll fix it soon): "Toutes nos félicitations" (litterally = All our congratulations) $ echo "Toutes nos félicitations." | java -jar ~/sb/languagetool/dist/LanguageTool.jar -l fr -v Expected text language: French Working on STDIN... 2048 rules activated for language French <S> Toutes[tous/R f p,tout/D f p] nos[no/N m p] félicitations[félicitation/N f p].[./M fin,</S>]<P/> 1.) Line 1, column 1, Rule ID: ACCORD_GENRE[3] Message: « Toutes » et « nos » ne semblent pas bien accordés en genre. Toutes nos félicitations. ^^^^^^^^^^ The false positive happens because the word "nos" gets incorrectly tagged as a noun (N m p) instead of "D e p". By trial and error, I found that it's the French disambiguation rule <rule name="nom" id="N"> which misclassified the POS of the word "nos" here (causing the false positive). It would be easier, if in verbose mode (-v command line flag), LanguageTool displayed the disambiguation rule IDs that matched for each word So instead of just printing... nos[no/N m p] ... LanguageTool could print something like this: nos[no/N m p](nom) ... where "nom" is the disambiguation rule ID that altered the POS tag of the word "nos" in this example (multiple rule IDs could be shown if multiple rules are matched, in the order that they are matched, since order matters for disambiguation). Regards -- Dominique ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel