I understand why you want to preprocess text. Sometimes, I have a similar problem. Sometimes, I want to ignore multiple spaces, line breaks, and tab characters.
However, automatically ignoring such text could cause problems. For example, not all double spaces are errors. For the Netherlands, "there should be a double space between the postcode and the post town" (http://www.royalmail.com/personal/help-and-support/Addressing-your-items-We stern-Europe). I did not mean that you should not preprocess text. I meant that you should not mess with the meaning of a regexp. Possibly, we can solve the conflict by having 2 types of <regexp>: <regexp type="exact-meaning"> <regexp type=" smart"> Regards, Mike Unwalla Contact: www.techscribe.co.uk/techw/contact.htm -----Original Message----- From: Dominique Pellé [mailto:dominique.pe...@gmail.com] Sent: 09 October 2015 06:33 Mike Unwalla <m...@techscribe.co.uk> wrote: > I agree with Purodha. Do not be 'smart'. Do not change the meaning of a regexp. > > Regards, > > Mike Unwalla OK. It looks like the majority does not want to pre-processs the sentence to remove consecutive spaces (including tabs, dos/unix new lines, form feeds, vertical space, non breaking space) before matching the regexp. So I will go with that. On the other other hand, nobody indicates how to avoid the regression. A line break for example in between words, typically doesn't happens in LIbreOffice documents or in our tests, but often happens in text files. In emails, line breaks are used to avoid lines longer than ~80 char. Taking the German rule GIRLS_DAY for example, it will now fail to match when "girl's day" is on a broken line as in this sentence. I see this as a severe regression. <snip> Regards Dominique ---------------------------------------------------------------------------- -- ------------------------------------------------------------------------------ _______________________________________________ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel