Re: [PATCH] Handle special characters in Junit report

Bhuvaneswaran A Fri, 04 Dec 2009 02:38:51 -0800

On Fri, 2009-12-04 at 10:22 +0000, Julian Foad wrote:
> Branko Čibej wrote:
> > Bhuvaneswaran A wrote:
> > > The failure message for few tests contain special characters, ex:
> 
> What do you mean by "special" characters? Unprintable characters?
> Non-UTF8 characters? Invalid XML characters? Characters that are XML
> syntax characters such as "<"?


When I mean special characters, I mean control characters, ex "^H".
Refer to the attachment in issue 3541 for a sample character.

> > > prop_tests.py. As a result, it creates an invalid xml file and not being
> > > displayed in Hudson.
> > >
> > > This commit fixes this issue, also tracked in issue 3541. With this fix,
> > > the test results are displayed in Hudson, especially the results
> > > specific to 1.6.x solaris build.
> > >   http://subversion.tigris.org/issues/show_bug.cgi?id=3541
> > >   
> > > Index: tools/dev/gen_junit_report.py
> > > =======================================
> > > --- tools/dev/gen_junit_report.py (revision 886204)
> > > +++ tools/dev/gen_junit_report.py (working copy)
> > > @@ -46,6 +46,16 @@
> > > data = data.replace(char, encode[char])
> > > return data
> > > +def remove_special_characters(data):
> > > + """remove special characters in test failure reasons"""
> > > + if not data:
> > > + return data
> > > + chars_table = "".join([chr(n) for n in xrange(256)])
> > > + # remove all special characters upto ascii value 31, except line
> > > feed (10)
> > > + # and carriage return (13)
> > > + chars_to_remove = chars_table[0:9] + chars_table[11:12] +
> > > chars_table[14:31]
> 
> Isn't the indexing off by one? Should be [0:10] ... [11:13] ... [14:32].

As per the comment, I wanted to preserve LF (10) and CR (13). [0:9] ...
[11:12] ... [14:31] works for me.

> > Also, wouldn't it be more proper to find out why the tests put control
> > characters in the failure description than to just blindly throw them away?
> 
> Or just escape the "special" characters.

Good point. I used to encode using utf-8, but it doesn't seem to
detect/encode these characters, resulting in unchanged behaviour. I used
something like:

reason = u'%s'.encode('utf-8') % reason
reason = unicode(reason, 'utf-8')

-- 
Bhuvaneswaran A    
CollabNet Software P Ltd.  |  www.collab.net

signature.asc
Description: This is a digitally signed message part

Re: [PATCH] Handle special characters in Junit report

Reply via email to