http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5056
------- Additional Comments From [EMAIL PROTECTED] 2006-12-26 17:39 -------
Theo, kindly keeper of the assassinator of spam wrote:
> Linda, "corruptor" of programs, and occasional Pain in the *** wrote:
>> While Text::Wrap did change, the change uncovered an SA bug.
T> I disagree, please see below.
>> The pattern '(?<=[\s,])' (Passed in the line:
>> ./lib/Mail/SpamAssassin/PerMsgStatus.pm:996: $hdr =
>> Mail::SpamAssassin::Util::wrap($hdr, "\t", "", 79, 0, '(?<=[\s,])');
>> is used with a *.
T> Due to a change in Text::Wrap, yes.
>> What may have been meant here was simply '[\s,]'.
T> Nope. Originally that's what it was. However, it caused bug 2165. The
T> solution (r5605 if anyone's interested) was to use the zero width assertion
T> which avoided removing the break character.
---
I was afraid it was something like that. But....
T> Unless I'm missing something in the Text::Wrap docs, there's no other way to
say
T> "break on this character, but don't remove it from the string".
---
Line::Break modules states:
"It is possible to control which characters terminate words by modifying
$Text::Wrap::break. Set this to a string such as '[\s:]' (to break
before spaces or colons) or a pre-compiled regexp such as "qr/[\s']/"
(to break before spaces or apostrophes). The default is simply '\s';
that is, words are terminated by spaces."
Unless I read it improperly, "$break" must contain characters.
A zero width assertion doesn't (as much as it would be interesting)
qualify as a character in any character set. :-)
Having it, both correspond to the break character, and having the code
swallow extra break characters makes sense, in the general scheme of things: the
code would look for a break character "break character" when it is time to
"break" by using '(:$break)*'. That would swallow up extra break characters
after a printing line.
Looking at the code for Text::Wrap (TW), I don't believe it was designed
to accept a zero-width pattern. By not passing in a match for 1 or more
characters, I believe SA is calling Text::Width with illegal parameters.
No matter if it is decided to maintain "SA"'s call to TW with an invalid
parameter or if it is fixed, the patch hiding a *valid* perl warning is bad
form.
Why is "," needed as a breaking character? Would it work (i.e. not cause
bug 2165) if you just used '\s'?
Regarding the warning, it is a valid warning about broken/bad perl code.
The broken code:
'(zero width assertion)<unbounded wildcard>' is invalid / undefined perl code.
I wouldn't be certain what it would do -- I suppose we should feel fortunate
that perl is smart enough to catch it -- since when trying to match a pattern, a
pattern like '(zero-width-pat)*' should repeat indefinitely.
If SA needs to "fork" the Text::Wrap code from 0705 to solve SA's
need, that would be one "valid" solution, but hiding away valid warnings about
invalid perl regular expressions or "useless use of infinitely-repeating
zero-width assertion" is just hiding a bug and asking for trouble.
Maybe Text::Wrap does need to be forked to provide the, functionality that SA
needs?
But please, don't shut off a valid warning. Anyone linking with TW after the
0705 version will silently be running the bad perl code.
It is technically experimental, but its been there, I believe since 5.8.1,
maybe the construct '([\s,]+)(?{$mybreakchars=$^N})' would provide a workaround?
You'd still have the "\s," pattern, and for any given invocation of TW, you'd
save the value of the break char such that it could be inserted back in?
If Text::Wrap might execute that line multiple times, maybe:
'([\s,]+)(?{push @mybreakchars,$^N})' would allow pushing on successive
break characters for each time it's used.
It sounds potentially ugly/messy. Is it possible just to ask for zero-width
assertion "support" in the Text::Width function? The code in 0705 was
technically "broken", in that it didn't properly use the value in "$break" as
the value to break-on -- it used '\s'. It still would have swallowed the break
char though.
*sigh*
-l
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.