Dear Jennifer,

My .fasta reference genome is like this:

'>gi|289546492|ref|NC_011420.2| Rhodospirillum centenum SW chromosome,
complete genome'

and in the SAM file generated by BOWTIE it says:

@SQ     SN:gi|289546492|ref|NC_011420.2|        LN:4355543

So I think they are all the same as "NC_011420.2". Is there anything
else I can try?

Thank you,

Qian


On Wed, Sep 26, 2012 at 3:35 PM, Jennifer Jackson <j...@bx.psu.edu> wrote:

> Hello,
>
> The first thing to double check is that the chromosome identifier is an
> exact match between the reference genome and the reference annotation.
>
> The GFF3 file is naming the chromosome "NC_011420.2".
>
> The reference annotation chromosome should be named exactly the same way.
> Check this in the input BAM/SAM datasets or the original .fasta reference
> genome.
>
> Hopefully this finds the problem. Correcting mismatched names (due to
> various reasons) is the most common solution to this sort of issue:
> 'Tools on the Main server: Example', bullet item #2:
> http://wiki.g2.bx.psu.edu/**Support#Interpreting_**scientific_results<http://wiki.g2.bx.psu.edu/Support#Interpreting_scientific_results>
>
> Best,
>
> Jen
> Galaxy team
>
>
>
> On 9/26/12 8:46 AM, Qian Dong wrote:
>
>> Dear Team,
>>
>> I've been having a problem with cufflink regarding GFF files. I tried
>> searching the mailing list first and failed to find an answer. Could you
>> help me look at this?
>>
>> I downloaded my genome annotation GFF file from NCBI (soon I realized
>> NCBI format may be a problem) for my bacterial RNA-seq data analysis. My
>> GFF file looks like the following:
>>
>> '
>> ##gff-version 3
>> #!gff-spec-version 1.20
>> #!processor NCBI annotwriter
>> ##sequence-region NC_011420.2 1 4355543
>> ##species http://www.ncbi.nlm.nih.gov/**Taxonomy/Browser/wwwtax.cgi?**
>> id=414684<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=414684>
>> NC_011420.2     RefSeq  region  1       4355543 .       +       .
>> ID=id0;Dbxref=taxon:414684;Is_**circular=true;culture-**
>> collection=ATCC:51521;gb-**synonym=Rhodocista
>> centenaria SW;gbkey=Src;genome=**chromosome;mol_type=genomic
>> DNA;strain=SW%3B ATCC 51521
>> NC_011420.2     RefSeq  gene    11      3343    .       +       .
>> ID=gene0;Name=RC1_0011;Dbxref=**GeneID:7008893;gbkey=Gene;**
>> locus_tag=RC1_0011
>> NC_011420.2     RefSeq  CDS     11      3343    .       +       0
>> ID=cds0;Name=YP_002296275.1;**Parent=gene0;Note=Contains a type I
>> secretion target ggxgxdxxx repeat %282 copies%29 domain%3B Contains a
>> Cadherin domain%3B identified by match to protein family HMM
>> PF02789;Dbxref=Genbank:YP_**002296275.1,GeneID:7008893;**
>> gbkey=CDS;product=hypothetical
>> protein;protein_id=YP_**002296275.1;transl_table=11
>>
>>
>> I used this file for cufflink but all the FPKM values are 0.  I checked
>> out this link: 
>> http://cufflinks.cbcb.umd.edu/**gff.html<http://cufflinks.cbcb.umd.edu/gff.html>and
>>  thought that
>> maybe the problem is because I don't have any mRNA feature in my gff
>> file. Since I am dealing with a bacterial genome, there is no
>> exon/intron or UTR info needed. Therefore I modified my GFF file into
>> the following:
>>
>> ##gff-version 3
>> #!gff-spec-version 1.20
>> #!processor NCBI annotwriter
>> ##sequence-region NC_011420.2 1 4355543
>> ##species http://www.ncbi.nlm.nih.gov/**Taxonomy/Browser/wwwtax.cgi?**
>> id=414684<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=414684>
>> NC_011420.2     RefSeq  region  1       4355543 .       +       .
>> ID=id0;Dbxref=taxon:414684;Is_**circular=true;culture-**
>> collection=ATCC:51521;gb-**synonym=Rhodocista
>> centenaria SW;gbkey=Src;genome=**chromosome;mol_type=genomic
>> DNA;strain=SW%3B ATCC 51521
>> NC_011420.2     RefSeq  mRNA    11      3343    .       +       .
>> ID=mRNA0;Name=RC1_0011;Dbxref=**GeneID:7008893;gbkey=Gene;**
>> locus_tag=RC1_0011
>> NC_011420.2     RefSeq  CDS     11      3343    .       +       0
>> ID=cds0;Name=YP_002296275.1;**Parent=mRNA0;Note=Contains a type I
>> secretion target ggxgxdxxx repeat %282 copies%29 domain%3B Contains a
>> Cadherin domain%3B identified by match to protein family HMM
>> PF02789;Dbxref=Genbank:YP_**002296275.1,GeneID:7008893;**
>> gbkey=CDS;product=hypothetical
>> protein;protein_id=YP_**002296275.1;transl_table=11
>>
>>
>>
>> I re-ran cufflink however this time there is error reported. I can only
>> tell from the report that there is a segmentation fault but not further
>> details. The report is as follows:
>>
>> Error running cufflinks.
>> return code = 139
>> Command line:
>> cufflinks -q --no-update-check -I 100 -F 0.100000 -j 0.150000 -p 4 -G
>> /galaxy/test_pool/pool5/files/**000/327/dataset_327777.dat
>> /galaxy/test_database/files/**000/325/dataset_325086.dat
>> [19:41:41] Loading reference annotation.
>> Segmentation fault
>>
>> cp: cannot stat `/galaxy/test_pool/pool3/tmp/**
>> job_working_directory/000/170/**170197/global_model.txt': No such file
>> or directory
>> cp: cannot stat `/galaxy/test_pool/pool3/tmp/**
>> job_working_directory/000/170/**170197/isoforms.fpkm_tracking'**: No
>> such file or directory
>> cp: cannot stat `/galaxy/test_pool/pool3/tmp/**
>> job_working_directory/000/170/**170197/genes.fpkm_tracking': No such
>> file or directory
>>
>>
>> My questions will be:
>>
>> 1. Is there any way to modify a NCBI bacterial genome annotation GFF
>> file to make it usable for cufflink? Our genome annotation is only
>> available in NCBI, not ensemble or USDC so this is pretty much my only
>> choice..
>>
>> 2. Should I proceed with modifying the GFF file or should I convert it
>> into GTF and use the GTF instead in cufflink?
>>
>> I am a biochemist and really new to the computer world so any advice
>> will help!
>>
>> Thanks a lot,
>>
>> Qian
>> --
>> Qian Dong
>> Bauer Lab, MCBD
>> Simon Hall: 313-317
>> 212 S. Hawthorne Dr.
>> Bloomington, IN 47405
>> Email:do...@indiana.edu 
>> <mailto:Email%3Adong3@indiana.**edu<email%253ado...@indiana.edu>
>> >
>> Lab Phone:812-855-8443
>>
>>
>>
>> ______________________________**_____________________________
>> The Galaxy User list should be used for the discussion of
>> Galaxy analysis and other features on the public server
>> at usegalaxy.org.  Please keep all replies on the list by
>> using "reply all" in your mail client.  For discussion of
>> local Galaxy instances and the Galaxy source code, please
>> use the Galaxy Development list:
>>
>>    
>> http://lists.bx.psu.edu/**listinfo/galaxy-dev<http://lists.bx.psu.edu/listinfo/galaxy-dev>
>>
>> To manage your subscriptions to this and other Galaxy lists,
>> please use the interface at:
>>
>>    http://lists.bx.psu.edu/
>>
>>
> --
> Jennifer Jackson
> http://galaxyproject.org
>



-- 
Qian Dong
Bauer Lab, MCBD
Simon Hall: 313-317
212 S. Hawthorne Dr.
Bloomington, IN 47405
Email:do...@indiana.edu
Lab Phone:812-855-8443
___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Reply via email to