On Mon, Apr 23, 2012 at 11:41 PM, Sarah Hicks <garlicsc...@gmail.com> wrote:
> Peter, you requested an example, here are the first five hits for my
> first query sequence (OTU#0)
>
> 0       324034994       527     93.23   266     13      5       1       265   
>   22      283     7e-102  379.0
> 0       56181650        513     93.26   267     10      8       1       265   
>   25      285     7e-102  379.0
> 0       314913953       582     91.79   268     13      9       1       265   
>   24      285     2e-92   347.0
> 0       305670062       281     92.52   254     14      5       4       256   
>   32      281     2e-92   347.0
> 0       310814066       1180    91.73   266     14      7       1       265   
>   24      282     9e-92   345.0
>
> You will notice there are 13 columns, one in addition to the 12 column
> titles you explained. This is because there is a column between sseqID
> and pident.

I see now - the megablast_wrapper.py calls megablast (from the old legacy
NCBI blast suite) which does indeed produce 12 column tabular output. But
the wrapper script then edits the output:

It appears to be splitting column 2 in two at the underscore intended to
give the match ID and the length. This puzzles me but I haven't used
the legacy BLAST tabular output for a while. On BLAST+ you can ask
for the query or subject length explicitly as their own columns so we
don't have this problem.

The megablast_wrapper.py also re-formats the floating point score in the
last column, apparently the NCBI style could cause problems with the
Galaxy filter tool.

> In the metagenomic tutorial the first 4 columns are
> explained, and column 3 is described as length of sequence in database
> (or length of the subject sequence).
>
> This is the problem column. The length of only one of the subject GI
> numbers above match the subject length in NCBI. This has caused me to
> wonder if I can trust the hit info. In all cases that I've checked,
> when this happens the correct match is the listed GI value minus 1
> (ie, in NCBI, gi|324034994 is not 527nt long, but 324034993 IS 527nt
> long).

That is strange.

Peter

___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Reply via email to