Adrian Pepper writes: > (Mailman 2.1.12, some local mods, but not around topics...) > > I had a utf-8 subject I was having difficulty matching with a topic regexp. > > Eventually I concluded the subject still had newlines in it when it was > matched against the regexp. (That is the continuation lines were not > joined before matching). And "." would not match the newline character(s)).
> Am I correct in my conclusion that .* won't match newline characters, > but <space-chars><not-space-chars><linefeed><carriage-return> will ? > (And also, that that is the character class I created). Yes. Here are the docs for Python regular expressions as used in Mailman: https://docs.python.org/2.7/library/re.html. In general this problem would be addressed with the DOTALL flag: The special characters are: '.' (Dot.) In the default mode, this matches any character except a newline. If the DOTALL flag has been specified, this matches any character including a newline. Note that the definition of "newline" here is exactly "\n". However, in your case I think there's a simpler method. > For production I might need to put [\s\S\n\r]* between every pair of > characters after a reasonable point in the expression. Unless I can > enumerate the possibilities more precisely. (Which will probably > result in an even longer looking character class). Well, actually what you need is just "\s*" (or perhaps "\s+" or "(\s|_)+") wherever a space might occur in the topic regexp, I think. Line folding can only occur at whitespace (breaking this rule would be noticed by everybody, and so is not likely to go unfixed), and "\s" already includes "\n". > Empirically I see ?=\n =?utf-8?q?_ after "Weekly" and before "Ac". > (And it seems the matching is done on the incoming subject, not the > one formatted for resending, which, with my tag, and the utf-8 > of an incoming tag pushes the expression entirely onto the second > line where I think the ".*" variant (or even [_ ]) would match. That would explain your observations, but I am not familiar with the topic code. I don't have time to address that until the weekend, and maybe not then as $DAYJOB is piling up work on me, and Mark is on vacation in Croatia, so you may have to wait a bit for a final answer on that. I'm sorry about that, but I think at least for now the "\s*" bandaid will get you most of the way to where you want to go. ------------------------------------------------------ Mailman-Users mailing list [email protected] https://mail.python.org/mailman/listinfo/mailman-users Mailman FAQ: http://wiki.list.org/x/AgA3 Security Policy: http://wiki.list.org/x/QIA9 Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/ Unsubscribe: https://mail.python.org/mailman/options/mailman-users/archive%40jab.org
