Peter,

It's good news that the description field is available in tabular format.   
That has been the primary reason for using the xml format.   The tabular format 
allows use of the task parallel feature.

I think you have good default options for output format.
But, I would think we should offer the option to get all available result 
information.   The multi-select checkboxes could serve that purpose.  Is NCBI 
good about maintaining the column position if their tabular over successive 
versions?

Thanks,

JJ



On 11/28/13, 11:13 AM, Peter Cock wrote:
Hello all,

FAO: Administrators of local Galaxy instances using the NCBI BLAST+ wrappers.

Over on the galaxy_blast repository I have been updating the
NCBI BLAST+ wrappers (including unit tests) to work with the
current release, NCBI BLAST+ 2.2.28 (aka BLAST 2.2.28+):
https://github.com/peterjc/galaxy_blast

The initial set of changes is now on the Test Tool Shed,
http://testtoolshed.g2.bx.psu.edu/view/peterjc/ncbi_blast_plus

This includes a workaround for a known regression in the
makeblastdb tool dealing with duplicated identifiers:
https://github.com/peterjc/galaxy_blast/commit/349e31c6cec4429c5523fde5975e28e399e0dfb1

In terms of end-user features, the big improvement in the
BLAST+ 2.2.28 release was the ability to get the BLAST
match descriptions in the tabular output, and other fields:
http://blastedbio.blogspot.co.uk/2012/05/blast-tabular-missing-descriptions.html

           staxids means Subject Taxonomy ID(s), separated by a ';'
         sscinames means Subject Scientific Name(s), separated by a ';'
         scomnames means Subject Common Name(s), separated by a ';'
       sblastnames means Subject Blast Name(s), separated by a ';'
                   (in alphabetical order)
        sskingdoms means Subject Super Kingdom(s), separated by a ';'
                   (in alphabetical order)
            stitle means Subject Title
        salltitles means All Subject Title(s), separated by a '<>'
           sstrand means Subject Strand
             qcovs means Query Coverage Per Subject
           qcovhsp means Query Coverage Per HSP

On this branch I am including the new salltitles field as the
25th column in the extended BLAST tabular output offered
within the Galaxy interface:

https://github.com/peterjc/galaxy_blast/tree/c25

However, I'm not so sure about the taxonomy fields. Since
(thus far) they are not available via the XML, I am leaning
to introducing a third tabular mode, e.g.

* Standard 12 columns (can convert from XML)
* Extended 25 columns (can convert from XML)
* Extended also with taxonomy (cannot currently convert from XML)

Instead, we could offer a pick-you-own columns route
(in all the primary BLAST tools, handled via macros)?:

* Standard 12 columns (can convert from XML)
* Extended 25 columns (can convert from XML)
* Pick your own columns from the full NCBI list
(depending on columns, can convert from XML)

This is inspired by JJ's changes to the BLAST XML to
tabular conversion tool for Galaxy-P,
https://github.com/jmchilton/galaxy_blast/commit/d79afc03522768323494818a40aac10513a6fa09

I would be much keener on the pick-you-own columns
option if it was possible for the tool to record arbitrary
column names for a tabular file in Galaxy's metadata
(I can't find a trello card, but I'm sure I've asked about
this before).

Any thoughts or comments? eg Hurry up and just release
this branch adding the hit descriptions as column 25 - we
want that now ;) [*]

Regards,

Peter

[*] For our local instance, the taxonomy stuff will be useful,
but right now I would prioritise the description, which we
currently get via the BLAST XML using this tool:
https://github.com/peterjc/galaxy_blast/tree/master/tools/blastxml_to_top_descr


--
James E. Johnson, Minnesota Supercomputing Institute, University of Minnesota
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
 http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
 http://galaxyproject.org/search/mailinglists/

Reply via email to