Just to be clear, others concur that with 3 uris or less, it works, 4 or
more it fails. It's inconsistent and exists in trunk as well.

It's inconsistent depending on the platform as well.

I am not sure if it is a Perl bug or an SA bug or something we are doing
wrong but it is a blocker.

--
Kevin A. McGrail
VP Fundraising, Apache Software Foundation
Chair Emeritus Apache SpamAssassin Project
https://www.linkedin.com/in/kmcgrail - 703.798.0171

On Sat, Jun 23, 2018 at 10:15 PM, Bill Cole <
[email protected]> wrote:

> On 22 Jun 2018, at 14:29 (-0400), Kevin A. McGrail wrote:
>
> Hi All,
>>
>> 3.4 is not passing tests for me with the idn_dots.t and it appears to
>> point
>> to a problem in P:M:S::get_uri_list.   I'm bleary from looking at this for
>> three days.  Can someone take a look at this?
>>
>> If you modify the t/idn_dots to print the uri list from the generated
>> message in the test, it fails in 3.4 but passes in Trunk and in the 3.4.1
>> release.  See below for output but basically there is a missing URI which
>> is why the test correctly fails.
>>
>
> I have made the test work by adding "use utf8" to the test script. This is
> just avoiding the underlying subtle bug.
>
> The breakage is only seen (so far) on the RedHat perl 5.16.3 packaged for
> EL7 and derivatives. I believe that 5.16.x was the last major release to
> NOT work in UTF-8 by default without "use utf8" explicitly used. I have
> replicated the incorrect parse with the spamassassin script and a message
> with all-ascii URLs, so the problem is somewhere in the spectacularly
> complicated RE that extracts URIs from the cooked text array inside
> PerMsgStatus->get_uri_detail_list. Making matters worse, if I run either
> t/idn_dots.t or spamassassin with the Perl debugger (-d) the parse works.
>
> Anyone who is still using an even older Perl could assist simply by
> confirming that the 3.4 branch from SVN fails subtest 4 of t/idn_dots.t if
> you remove or comment out the "use utf8" line I added to that file today.
>
> It would be interesting to see it the problem would be solved by adding
> "use utf8" to every .pm that had a "use bytes" declaration before 2017.
> This is a bit of a shotgun approach but simpler than hunting for the
> specific issue. I'd try it myself, but that I'm basically on my last
> stealable minute for the weekend already.
>
>
> --
> Bill Cole
> [email protected] or [email protected]
> (AKA @grumpybozo and many *@billmail.scconsult.com addresses)
> Currently Seeking Steadier Work: https://linkedin.com/in/billcole
>

Reply via email to