Hi
Say I want to detect invalid use of word "a" (= has, verb)
instead of "à" (= at, preposition) in many French expressions
such as:
a nouveau -> à nouveau
a plein temps -> à plein temps
a rude épreuve -> à rude épreuve
a vol d'oiseau -> à vol d'oiseau
etc.
I wish I could write a rule pattern like this:
<rule>
<pattern>
<marker><token>a</token></marker>
<tokens>plein temps#chaque fois#rude épreuve#vol d’oiseau</tokens>
</pattern>
...
</rule>
Notice the <tokens> tag, with an 's' instead of <token>.
The # character and space characters inside <tokens>...#...#...</tokens>
would be automatically interpreted in such a way that the above rule
is equivalent to much more verbose set of rules:
<rule>
<pattern>
<marker><token>a</token></marker>
<token>plein</tokens>
<token>temps</tokens>
</pattern>
</rule>
<rule>
<pattern>
<marker><token>a</token></marker>
<token>chaque</tokens>
<token>fois</tokens>
</pattern>
</rule>
<rule>
<pattern>
<marker><token>a</token></marker>
<token>rude</tokens>
<token>épreuve</tokens>
</pattern>
</rule>
<rule>
<pattern>
<marker><token>a</token></marker>
<token>vol</tokens>
<token>d</tokens>
<token>’</tokens>
<token>oiseau</tokens>
</pattern>
</rule>
In other words:
* each # character inside <tokens>...#...#...</tokens> creates
a new <rule>.
* And the spaces inside <tokens>...</token>> causes automatic
tokenization so that something like <tokens>rude épreuve</tokens>
is automatically interpreted as <token>rude</token><token>épreuve</token>.
I'm curious whether rule maintainers would find it useful.
Regards
Dominique
------------------------------------------------------------------------------
_______________________________________________
Languagetool-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/languagetool-devel