Please find attached the revised patch. I incorporated following feedback: a) Fix the array slicing part b) Escape using ord() instead of removing those characters c) Handle "]]>" in CDATA section d) Define the ascii table globally (once) and re-use
I also verified this fix by generating the junit files for tests having special characters and simulating a test that has "]]>" in failure text. With this patch, it generates valid junit file. [[ Fix for issue 3541. When generating junit reports, escape special characters, if any, from test failure messages. * tools/dev/gen_junit_report.py (escape_special_characters): New method to escape special characters. (junit_testcase_fail, junit_testcase_xfail): Use escape_special_characters() method to escape special characters in test failure messages. ]] On Fri, 2009-12-04 at 11:19 +0000, Julian Foad wrote: > Bhuvaneswaran A wrote: > > On Fri, 2009-12-04 at 10:22 +0000, Julian Foad wrote: > > > Branko Čibej wrote: > > > > Bhuvaneswaran A wrote: > > > > > The failure message for few tests contain special characters, ex: > > > > > > What do you mean by "special" characters? Unprintable characters? > > > Non-UTF8 characters? Invalid XML characters? Characters that are XML > > > syntax characters such as "<"? > > > > When I mean special characters, I mean control characters, ex "^H". > > Refer to the attachment in issue 3541 for a sample character. > > > > > > > prop_tests.py. As a result, it creates an invalid xml file and not > > > > > being > > > > > displayed in Hudson. > > > > > > > > > > This commit fixes this issue, also tracked in issue 3541. With this > > > > > fix, > > > > > the test results are displayed in Hudson, especially the results > > > > > specific to 1.6.x solaris build. > > > > > http://subversion.tigris.org/issues/show_bug.cgi?id=3541 > > > > > > > > > > Index: tools/dev/gen_junit_report.py > > > > > ======================================= > > > > > --- tools/dev/gen_junit_report.py (revision 886204) > > > > > +++ tools/dev/gen_junit_report.py (working copy) > > > > > @@ -46,6 +46,16 @@ > > > > > data = data.replace(char, encode[char]) > > > > > return data > > > > > +def remove_special_characters(data): > > > > > + """remove special characters in test failure reasons""" > > > > > + if not data: > > > > > + return data > > > > > + chars_table = "".join([chr(n) for n in xrange(256)]) > > > > > + # remove all special characters upto ascii value 31, except line > > > > > feed (10) > > > > > + # and carriage return (13) > > > > > + chars_to_remove = chars_table[0:9] + chars_table[11:12] + > > > > > chars_table[14:31] > > > > > > Isn't the indexing off by one? Should be [0:10] ... [11:13] ... [14:32]. > > > > As per the comment, I wanted to preserve LF (10) and CR (13). [0:9] ... > > [11:12] ... [14:31] works for me. > > In array slicing, the range start is inclusive but the end is exclusive, > so [11:12] just gives the single character [11], whereas [11:13] would > give [11] and [12] which is what you want. > > > > > Also, wouldn't it be more proper to find out why the tests put control > > > > characters in the failure description than to just blindly throw them > > > > away? > > > > > > Or just escape the "special" characters. > > > > Good point. I used to encode using utf-8, but it doesn't seem to > > detect/encode these characters, resulting in unchanged behaviour. I used > > something like: > > > > reason = u'%s'.encode('utf-8') % reason > > reason = unicode(reason, 'utf-8') > > Those statements are just converting from UTF-8 to internal Unicode > representation, not doing any escaping. Control characters (and all > characters) are valid in both UTF-8 and internal Unicode > representations, so no escaping is needed for the conversion. > > I searched on the web and didn't find a really really simple way to > escape a set of characters. I think something like > > for c in chars_to_remove: > data = data.replace(c, '%%%0x' % ord(c)) > > would do it. > > - Julian > > -- Bhuvaneswaran A CollabNet Software P Ltd. | www.collab.net
Index: tools/dev/gen_junit_report.py
===================================================================
--- tools/dev/gen_junit_report.py (revision 886204)
+++ tools/dev/gen_junit_report.py (working copy)
@@ -33,6 +33,8 @@
import os
import getopt
+ASCII_TABLE = "".join([chr(n) for n in xrange(256)])
+
def xml_encode(data):
"""encode the xml characters in the data"""
encode = {
@@ -46,6 +48,18 @@
data = data.replace(char, encode[char])
return data
+def escape_special_characters(data):
+ """remove special characters in test failure reasons"""
+ if not data:
+ return data
+ # remove all special characters upto ascii value 31, except line feed (10)
+ # and carriage return (13)
+ chars_to_remove = ASCII_TABLE[0:10] + ASCII_TABLE[11:13] + ASCII_TABLE[14:32]
+ for char in chars_to_remove:
+ data = data.replace(char, '%%%0x' % ord(char))
+ data = data.replace(']]>', ']]]]><![CDATA[>')
+ return data
+
def start_junit():
"""define the beginning of xml document"""
head = """<?xml version="1.0" encoding="UTF-8"?>"""
@@ -72,6 +86,7 @@
"""mark the test case as FAILED"""
casename = xml_encode(casename)
sub_test_name = test_name.replace('.', '-')
+ reason = escape_special_characters(reason)
case = """<testcase time="ELAPSED_CASE_%s" name="%s" classname="%s">
<failure type="Failed"><![CDATA[%s]]></failure>
</testcase>""" % (test_name, casename, sub_test_name, reason)
@@ -81,6 +96,7 @@
"""mark the test case as XFAILED"""
casename = xml_encode(casename)
sub_test_name = test_name.replace('.', '-')
+ reason = escape_special_characters(reason)
case = """<testcase time="ELAPSED_CASE_%s" name="%s" classname="%s">
<system-out><![CDATA[%s]]></system-out>
</testcase>""" % (test_name, casename, sub_test_name, reason)
signature.asc
Description: This is a digitally signed message part

