I've recently been looking at revamping an archive and having MHonArc output XML which is then pulled into a PHP based application using XML_Unserialize.
Mostly this is working fine, but I have the occasional problem with control characters in badly formatted emails. Specifically, a QP email with the string =12 - MHonArc outputs the associated control character to the XML. These characters are not valid in XML and the XML parser chokes on them. I see a quick mention of a similar problem back in 2000: http://www.mhonarc.org/archive/html/mhonarc-users/2000-07/msg00040.html Have things changed? Is there any way short of writing a custom filter, or hacking/patching an existing one, that I can persuade MHonArc to strip out XML illegal control characters? If not, any hints on where to start hacking? Thanks -- Chris Hastie