I understand why you want to preprocess text. Sometimes, I have a similar
problem. Sometimes, I want to ignore multiple spaces, line breaks, and tab
characters.

However, automatically ignoring such text could cause problems. For example,
not all double spaces are errors. For the Netherlands, "there should be a
double space between the postcode and the post town"
(http://www.royalmail.com/personal/help-and-support/Addressing-your-items-We
stern-Europe).

I did not mean that you should not preprocess text. I meant that you should
not mess with the meaning of a regexp.

Possibly, we can solve the conflict by having 2 types of <regexp>:
<regexp type="exact-meaning">
<regexp type=" smart">

Regards,

Mike Unwalla
Contact: www.techscribe.co.uk/techw/contact.htm 

-----Original Message-----
From: Dominique Pellé [mailto:dominique.pe...@gmail.com] 
Sent: 09 October 2015 06:33

Mike Unwalla <m...@techscribe.co.uk> wrote:
> I agree with Purodha. Do not be 'smart'. Do not change the meaning of a
regexp.
>
> Regards,
>
> Mike Unwalla


OK. It looks like the majority does not want to pre-processs the sentence
to remove consecutive spaces (including tabs, dos/unix new lines, form
feeds, vertical space, non breaking space) before matching the regexp.
So I will go with that.

On the other other hand, nobody indicates how to avoid the regression. A
line break for example in between words, typically doesn't happens in
LIbreOffice documents or in our tests, but often happens in text files. In
emails, line breaks are used to avoid lines longer than ~80 char. Taking the
German rule GIRLS_DAY for example, it will now fail to match when "girl's
day" is on a broken line as in this sentence. I see this as a severe
regression.

<snip>

Regards
Dominique

----------------------------------------------------------------------------
--



------------------------------------------------------------------------------
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to