Peter, you requested an example, here are the first five hits for my
first query sequence (OTU#0)

0       324034994       527     93.23   266     13      5       1       265     
22      283     7e-102  379.0
0       56181650        513     93.26   267     10      8       1       265     
25      285     7e-102  379.0
0       314913953       582     91.79   268     13      9       1       265     
24      285     2e-92   347.0
0       305670062       281     92.52   254     14      5       4       256     
32      281     2e-92   347.0
0       310814066       1180    91.73   266     14      7       1       265     
24      282     9e-92   345.0

You will notice there are 13 columns, one in addition to the 12 column
titles you explained. This is because there is a column between sseqID
and pident. In the metagenomic tutorial the first 4 columns are
explained, and column 3 is described as length of sequence in database
(or length of the subject sequence).

This is the problem column. The length of only one of the subject GI
numbers above match the subject length in NCBI. This has caused me to
wonder if I can trust the hit info. In all cases that I've checked,
when this happens the correct match is the listed GI value minus 1
(ie, in NCBI, gi|324034994 is not 527nt long, but 324034993 IS 527nt
long).



On Mon, Apr 23, 2012 at 11:05 AM, Peter Cock <p.j.a.c...@googlemail.com> wrote:
> On Mon, Apr 23, 2012 at 5:56 PM, Sarah Hicks <garlicsc...@gmail.com> wrote:
>> I am having trouble finding information on the MegaBLAST output
>> columns. What is each column for? I can't seem to figure this out by
>> comparing info in the columns to NCBI directly because the GI#'s don't
>> match with the correct entry on NCBI. I've seen that others have
>> posted about that problem, so I'm also waiting on details on that
>> question, but for now, I'd just like to know what to make of the
>> output...
>> best,
>> Sarah
>
> I've not tried to track down this reported possible bug in GI numbers,
> and weather it also affects BLAST+ as well as the legacy NCBI BLAST
> (which has now been discontinued). Do you have a specific example.
>
> As to the 12 columns, they are standard BLAST tabular output, and
> should match the defaults in BLAST+ tabular output which are:
>
> Column  NCBI name       Description
> 1       qseqid  Query Seq-id (ID of your sequence)
> 2       sseqid  Subject Seq-id (ID of the database hit)
> 3       pident  Percentage of identical matches
> 4       length  Alignment length
> 5       mismatch        Number of mismatches
> 6       gapopen         Number of gap openings
> 7       qstart  Start of alignment in query
> 8       qend    End of alignment in query
> 9       sstart  Start of alignment in subject (database hit)
> 10      send    End of alignment in subject (database hit)
> 11      evalue  Expectation value (E-value)
> 12      bitscore        Bit score
>
> Peter

___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Reply via email to