On Wed, Nov 20, 2013 at 7:10 PM, Dooley, Damion <damion.doo...@bccdc.ca> wrote:
> I'm doing a 1 step generic reporting tool along the lines of the "BLAST
> XML to tabular" script by Peter.  I was just about to ask about this line,
> which looked pretty much like a bug:
>
>         sallseqid = ";".join(name.split(None,1)[0] for name in 
> hit_def.split(" >"))
>
> Then I found the patch from Nov 7th 2013:
>
>         
> https://github.com/peterjc/galaxy_blast/blob/master/tools/ncbi_blast_plus/blastxml_to_tabular.py
>
>         try:
>                 sallseqid = ";".join(name.split(None,1)[0] for name in 
> hit_def.split(" >"))
>         except IndexError as e:
>                 stop_err("Problem splitting multuple hits?\n%r\n--> %s" % 
> (hit_def, e))
>
> Yay!  But what I've seen in recent XML output reports is that the ">"
> content has been changed to "&gt;" .  E.g.
>
>         
> https://github.com/biopython/biopython/blob/master/Tests/Blast/mirna.xml
>
>         <Hit>
>                 <Hit_num>66</Hit_num>
>                 <Hit_id>gi|195029385|ref|XR_047134.1|</Hit_id>
>                 <Hit_def>Drosophila grimshawi miR-7-RA (Dgri\mir-7), ncRNA 
> &gt;gi|195336156|ref|XR_048470.1| Drosophila sechellia miR-7-RA (Dsec\mir-7), 
> ncRNA &gt;gi|195585143|ref|XR_050309.1| Drosophila simulans miR-7-RA 
> (Dsim\mir-7), ncRNA</Hit_def>
>                 <Hit_accession>XR_047134</Hit_accession>
>                 ...
>
> So perhaps a stop_err() could be avoided, if test is for "&gt;" instead?
> I assume that no variants of python ElementTree.iterparse() will
> unescape content when returned via the iterator?
>
> Damion


On Wed, Nov 20, 2013 at 7:31 PM, Dooley, Damion <damion.doo...@bccdc.ca> wrote:
> Woops - I realize now findtext() must be unescaping all "&gt;", so Peter
> was trying to address other non-splitting occurances of " >" as per his
> patch notes.  But perhaps a stop_err() isn't merrited in this case?
>
> So ignore my test for "&gt;" comment.
>
> Regards,
>
> Damion

OK - good. I was worried that there might be some inconsistency
between different databases of versions of BLAST about how
the &gt; was encoded.

As to why I treat this as a fatal error (calling stop_err), the
alternative would be to issue a warning to stderr, and guess
what the data ought to look like? That just seems like asking
for trouble - a big red error should ensure I hear bug reports ;)

Zen of Python: Errors should never pass silently.

Peter
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Reply via email to