Jaume Ortolà i Font <jaumeort...@gmail.com> wrote:

2015-09-05 16:11 GMT+02:00 Daniel Naber <daniel.na...@languagetool.org>:
>
>> On 2015-09-04 23:21, Dominique Pellé wrote:
>>
>> > I wish I could write a rule pattern like this:
>> >
>> >       <tokens>plein temps#chaque fois#rude épreuve#vol
>> > d’oiseau</tokens>
>>
>> What about a more radical approach (which would be trickier to
>> implement):
>>
>> <token>a</token>
>> <regex>plein temps|chaque fois|rude épreuve|vol d’oiseau</regex>
>>
>
> Or even more general. Some times I wish I could write rules with regular
> expressions ignoring completely the tokenizaton, taking the whole sentence
> as a string.
>
> In the case of Dominique's rule it would be something like:
>
> search: a (plein temps|chaque fois|rude épreuve|vol d’oiseau)
> and suggest replacing with: à $1
>


Yes, I was also thinking about that, but I did not dare proposing it :-)
My concern was that tokenization is needed for performances, but
maybe that's not true.

It is similar to what Daniel wrote earlier as well:

<regex>a (plein temps|chaque fois|rude épreuve|vol d’oiseau)</regex>

It would make some such rules a lot simpler to write and more concise.
Matching without tokenization is what Lightproof and Grammalecte do.

Regards
Dominique
------------------------------------------------------------------------------
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to