Daniel Naber wrote:

> On 2015-10-08 06:59, Dominique Pellé wrote:

>> ... then the regexp rule does not detect all the errors
>> that the <patttern> rule detected. It does not detect errors
>> in "foo  bar"  (2 spaces or more, or tabs) or when there is a
>> new line as in:
>>
>>   foo
>>   bar
>>
>> How to fix it?
>
> I don't think it should be fixed, as two consecutive spaces it usually
> an error that should be fixed first.

The double space is meant to be caught by another rule.
I have text files that are indented with spaces or sometimes
justified and so may use several spaces and they also contain
new lines in the middle of sentences. For such files, I disable
rule WHITESPACE_RULE, but I still want to catch other errors
like "foo  bar" as in my example that were caught before when
using <pattern><token>...</pattern>


> Using \s+ for all spaces makes the
> regex very difficult to read.

I agree: it clutters the regexp, especially if it's in
many places, which is the opposite of the the goal
of <regexp> was precisely to make it easier to maintain.

That's why I also proposed solution 2).


Purodha Blissenbach wrote:

> I suggest version 1, since 2 would alter the usual
> meaning of regular expressions which I believe is
> a bad idea.

No necessarily.  The regexp could still be the unmodified regex.
It's the sentence that can be pre-processed before matching it
to replace all sequences of consecutive spaces (spaces,
tabs, new line and even other Unicode spaces) with a
single space.  So the regexp ends up being matched against
"foo bar" (1 space) instead of "foo  bar" (2 or more spaces).
Thinking further about it, this would be my preference.

Regards
Dominique

------------------------------------------------------------------------------
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to