I'm doing a 1 step generic reporting tool along the lines of the "BLAST XML to 
tabular" script by Peter.  I was just about to ask about this line, which 
looked pretty much like a bug: 

        sallseqid = ";".join(name.split(None,1)[0] for name in hit_def.split(" 
>"))

Then I found the patch from Nov 7th 2013:

        
https://github.com/peterjc/galaxy_blast/blob/master/tools/ncbi_blast_plus/blastxml_to_tabular.py

        try:
                sallseqid = ";".join(name.split(None,1)[0] for name in 
hit_def.split(" >"))
        except IndexError as e:
                stop_err("Problem splitting multuple hits?\n%r\n--> %s" % 
(hit_def, e))

Yay!  But what I've seen in recent XML output reports is that the ">" content 
has been changed to ">" .  E.g. 

        https://github.com/biopython/biopython/blob/master/Tests/Blast/mirna.xml

        <Hit>
                <Hit_num>66</Hit_num>
                <Hit_id>gi|195029385|ref|XR_047134.1|</Hit_id>
                <Hit_def>Drosophila grimshawi miR-7-RA (Dgri\mir-7), ncRNA 
&gt;gi|195336156|ref|XR_048470.1| Drosophila sechellia miR-7-RA (Dsec\mir-7), 
ncRNA &gt;gi|195585143|ref|XR_050309.1| Drosophila simulans miR-7-RA 
(Dsim\mir-7), ncRNA</Hit_def>
                <Hit_accession>XR_047134</Hit_accession>
                ...

So perhaps a stop_err() could be avoided, if test is for "&gt;" instead?  I 
assume that no variants of python ElementTree.iterparse() will unescape content 
when returned via the iterator?

Damion
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Reply via email to