Just to be clear, others concur that with 3 uris or less, it works, 4 or more it fails. It's inconsistent and exists in trunk as well.
It's inconsistent depending on the platform as well. I am not sure if it is a Perl bug or an SA bug or something we are doing wrong but it is a blocker. -- Kevin A. McGrail VP Fundraising, Apache Software Foundation Chair Emeritus Apache SpamAssassin Project https://www.linkedin.com/in/kmcgrail - 703.798.0171 On Sat, Jun 23, 2018 at 10:15 PM, Bill Cole < [email protected]> wrote: > On 22 Jun 2018, at 14:29 (-0400), Kevin A. McGrail wrote: > > Hi All, >> >> 3.4 is not passing tests for me with the idn_dots.t and it appears to >> point >> to a problem in P:M:S::get_uri_list. I'm bleary from looking at this for >> three days. Can someone take a look at this? >> >> If you modify the t/idn_dots to print the uri list from the generated >> message in the test, it fails in 3.4 but passes in Trunk and in the 3.4.1 >> release. See below for output but basically there is a missing URI which >> is why the test correctly fails. >> > > I have made the test work by adding "use utf8" to the test script. This is > just avoiding the underlying subtle bug. > > The breakage is only seen (so far) on the RedHat perl 5.16.3 packaged for > EL7 and derivatives. I believe that 5.16.x was the last major release to > NOT work in UTF-8 by default without "use utf8" explicitly used. I have > replicated the incorrect parse with the spamassassin script and a message > with all-ascii URLs, so the problem is somewhere in the spectacularly > complicated RE that extracts URIs from the cooked text array inside > PerMsgStatus->get_uri_detail_list. Making matters worse, if I run either > t/idn_dots.t or spamassassin with the Perl debugger (-d) the parse works. > > Anyone who is still using an even older Perl could assist simply by > confirming that the 3.4 branch from SVN fails subtest 4 of t/idn_dots.t if > you remove or comment out the "use utf8" line I added to that file today. > > It would be interesting to see it the problem would be solved by adding > "use utf8" to every .pm that had a "use bytes" declaration before 2017. > This is a bit of a shotgun approach but simpler than hunting for the > specific issue. I'd try it myself, but that I'm basically on my last > stealable minute for the weekend already. > > > -- > Bill Cole > [email protected] or [email protected] > (AKA @grumpybozo and many *@billmail.scconsult.com addresses) > Currently Seeking Steadier Work: https://linkedin.com/in/billcole >
