I have a system on the build farm (CentOS 5.9) which is failing the doc build. It fails this because there is a global environment variable we set on a class of machines, one of which is running the farm client. It has to do with setting an automatic PERL_UNICODE environment variable. Since it's global, this initialization is done at instantiation, and affects every object created which will pass data back and forth, initializing some or all parts of its interface to expect certain formats.
This causes a problem during the portion of the doc building which makes the spec.txt. Normally the "w3m -dump" command generates some non-ASCII characters (boxes/borders) and it's piped to the ./Tidytxt perl script which converts those characters to plain ASCII. The problem is that if the PERL_UNICODE environment variable is set, in any way, shape, or form (even empty!) the data coming in on STDIN is not treated byte-by-byte, but as unicode chars. The s/// command then doesn't match per ascii character, but per unicode character. In this simple test, I copied and pasted one character from the w3m -dump output into a simple text file, and you can see that it is a 3 byte unicode character from the hexdump: [farm@ivwm01 doc-docbook]$ cat test1.txt ┌ [farm@ivwm01 doc-docbook]$ cat test1.txt | hexdump 0000000 94e2 0a8c 0000004 [farm@ivwm01 doc-docbook]$ cat test1.txt | ./Tidytxt.orig ┌ [farm@ivwm01 doc-docbook]$ cat test1.txt | ./Tidytxt + The fix above was to add to the beginning of the Tidytxt script: --- ../../../exim/doc/doc-docbook/Tidytxt 2013-10-29 15:54:20.000000000 +0000 +++ ./Tidytxt 2014-01-13 16:38:23.000000000 +0000 @@ -11,6 +11,7 @@ # (2) It uses U+25CF as its bullet character. # (3) It inserts a whole slew of "box drawing" characters round the heading. +binmode(STDIN, ":encoding(iso-8859-1)"); @lines = <>; $lastwasblank = 0; 1) Anybody who can assure me that this won't break on old perl versions? (I'm on 5.8 on this machine). 2) Anybody who can assure me that this won't break on new perl versions? 3) Anybody think of a better way to do this? It really doesn't hurt the build process, it's just that in a couple of corner cases, the spec.txt file could have some non-ASCII in it. This is not the machine that I use to build the official releases, but it could have happened to anybody with just the right combination of environment/settings. ...Todd -- The total budget at all receivers for solving senders' problems is $0. If you want them to accept your mail and manage it the way you want, send it the way the spec says to. --John Levine -- ## List details at https://lists.exim.org/mailman/listinfo/exim-dev Exim details at http://www.exim.org/ ##
