>As Robert and Ken pointed out, one explanation could be that the >content is converted twice, the second time incorrectly.
I saw those replies, but I wasn't sure how to interpret them (as in, the evidence is compelling, but I have no idea why that would be happening or what to do about it). >I don't see at this point how mhfixmsg could do that but this needs more >investigation. We can continue this way, or if you want to send me a >sanitized excerpt of the message, I'd be glad to work with it. I can't think of a reasonable way to sanitize it, but I'm willing to send it to you privately. Should I use your <[email protected]> address for this purpose? >> $ mhfixmsg -decodetext 8bit -decodetypes text -textcharset UTF-8 -reformat \ >> -fixcte -fixboundary -noreplacetextplain \ >> -fixtype application/octet-stream -verbose -file - \ >> -outfile $destination < $source >> mhfixmsg: /home/smw/Mail/mhfixmsgnss3pI part 2, decode text/plain; >> charset=iso-8859-1 >> mhfixmsg: /home/smw/Mail/mhfixmsgnss3pI part 1, decode text/html; >> charset=iso-8859-1 >> mhfixmsg: /home/smw/Mail/mhfixmsgnss3pI part 2, convert UTF-8 to UTF-8 >> >> ...which is interesting for more than one reason, including that there's >> apparently no conversion of iso-8859-1 to UTF-8, > >That's strange, unless $source had already been run through mhfixmsg. It hadn't. In normal use my procmail-invoked shell script does run the message through a program I wrote myself, which decodes 2047-encoded headers -- but that only affects the headers, and passes the body through unmodified; the relevant excerpt for that is: [ loop that processes header lines elided] 172 /** an empty input line means the end of the message headers: **/ 173 174 if (strlen(input_line) < 1) break; 175 } 176 177 178 /** read and write message body: **/ 179 180 while (getline(&input_line, &len, infile) >= 0) 181 { 182 fputs(input_line, outfile); 183 } 184 185 186 /** ...and we're done: **/ 187 188 return(0); 189 190 } The only change this produces in the problematic message is as follows: 47,57c47,57 < X-SG-EID: =?us-ascii?Q?CePduXinO1TKWf=2FmbcRcIcb5o7KEfW6Q=2FLxIZrPrRA0dtxQ5evb2UIV0M0r6v6?= < =?us-ascii?Q?DfqG=2FoldGlAr6l6p1riD1OEyVdX0=2F57dKo740dz?= < =?us-ascii?Q?NZIhwlTw5J3KSyIU4H7pjfyfMBv0e9LGxKHVezS?= < =?us-ascii?Q?FeSLaVJyOzyyK3LeB3eGx+QysKjtjkJzuVDXsW4?= < =?us-ascii?Q?ZiePczPvW34XaHeheXAl2m0RGMRgZENpvRzzX2M?= < =?us-ascii?Q?G6=2FuEHfZ5+X57rF1w=3D?= < X-SG-ID: =?us-ascii?Q?N2C25iY2uzGMFz6rgvQsb8raWjw0ZPf1VmjsCkspi=2FKHgAsE=2FCUk5eZaRe5Ltr?= < =?us-ascii?Q?cbw5EBe1xYnaBlEvYrWq76guWX6eVcLnBjZLZsv?= < =?us-ascii?Q?fUgud7M9swcG4+O7RGb81dd6HibI6WdUCRYi2bx?= < =?us-ascii?Q?T8y2GlCc1B+71TSgKjD9dEU2IqN30RZ1qRbAGlx?= < =?us-ascii?Q?5EAyl462xuJc+?= --- > X-SG-EID: CePduXinO1TKWf/mbcRcIcb5o7KEfW6Q/LxIZrPrRA0dtxQ5evb2UIV0M0r6v6 > DfqG/oldGlAr6l6p1riD1OEyVdX0/57dKo740dz > NZIhwlTw5J3KSyIU4H7pjfyfMBv0e9LGxKHVezS > FeSLaVJyOzyyK3LeB3eGx+QysKjtjkJzuVDXsW4 > ZiePczPvW34XaHeheXAl2m0RGMRgZENpvRzzX2M > G6/uEHfZ5+X57rF1w= > X-SG-ID: N2C25iY2uzGMFz6rgvQsb8raWjw0ZPf1VmjsCkspi/KHgAsE/CUk5eZaRe5Ltr > cbw5EBe1xYnaBlEvYrWq76guWX6eVcLnBjZLZsv > fUgud7M9swcG4+O7RGb81dd6HibI6WdUCRYi2bx > T8y2GlCc1B+71TSgKjD9dEU2IqN30RZ1qRbAGlx > 5EAyl462xuJc+ ...but in my testing last night and just now, I see the same behavior when I run mhfixmsg directly on the unmodified original file (my script always saves an unmodified copy when it makes changes, in case something goes wrong). >Conversion to the same charset is a no-op, I'll look into removing the >verbose output in that case. That's probably a helpful thing to do, but the question I was wondering about wasn't why the UTF-to-UTF conversion was reported, but rather why the iso-8859-1-to-UTF conversion wasn't reported. >> and that in fact it's part 1 rather than part 2 that gets converted >> improperly > >The part numbers are reversed because that's the order used for display. >Part 2 is the text/plain part, that's the one that got converted. Thank you. That clears up part of my confusion. - Steven -- ___________________________________________________________________________ Steven Winikoff | "The thing is, I mean, there's times when Montreal, QC, Canada | you look at the universe and you think, [email protected] | 'What about me?' and you can just hear http://smwonline.ca | the universe replying, 'Well, what about | you?'" | - Terry Pratchett (Thief of Time)
