On Fri, 2009-12-04 at 10:22 +0000, Julian Foad wrote: > Branko Čibej wrote: > > Bhuvaneswaran A wrote: > > > The failure message for few tests contain special characters, ex: > > What do you mean by "special" characters? Unprintable characters? > Non-UTF8 characters? Invalid XML characters? Characters that are XML > syntax characters such as "<"?
When I mean special characters, I mean control characters, ex "^H". Refer to the attachment in issue 3541 for a sample character. > > > prop_tests.py. As a result, it creates an invalid xml file and not being > > > displayed in Hudson. > > > > > > This commit fixes this issue, also tracked in issue 3541. With this fix, > > > the test results are displayed in Hudson, especially the results > > > specific to 1.6.x solaris build. > > > http://subversion.tigris.org/issues/show_bug.cgi?id=3541 > > > > > > Index: tools/dev/gen_junit_report.py > > > ======================================= > > > --- tools/dev/gen_junit_report.py (revision 886204) > > > +++ tools/dev/gen_junit_report.py (working copy) > > > @@ -46,6 +46,16 @@ > > > data = data.replace(char, encode[char]) > > > return data > > > +def remove_special_characters(data): > > > + """remove special characters in test failure reasons""" > > > + if not data: > > > + return data > > > + chars_table = "".join([chr(n) for n in xrange(256)]) > > > + # remove all special characters upto ascii value 31, except line > > > feed (10) > > > + # and carriage return (13) > > > + chars_to_remove = chars_table[0:9] + chars_table[11:12] + > > > chars_table[14:31] > > Isn't the indexing off by one? Should be [0:10] ... [11:13] ... [14:32]. As per the comment, I wanted to preserve LF (10) and CR (13). [0:9] ... [11:12] ... [14:31] works for me. > > Also, wouldn't it be more proper to find out why the tests put control > > characters in the failure description than to just blindly throw them away? > > Or just escape the "special" characters. Good point. I used to encode using utf-8, but it doesn't seem to detect/encode these characters, resulting in unchanged behaviour. I used something like: reason = u'%s'.encode('utf-8') % reason reason = unicode(reason, 'utf-8') -- Bhuvaneswaran A CollabNet Software P Ltd. | www.collab.net
signature.asc
Description: This is a digitally signed message part