Re: [galaxy-user] all FPKMs are 0 in the tmap files produced by cuffcompare

Yang Bi Mon, 13 Jan 2014 18:32:36 -0800

Hi Jen:

Thank you for the prompt reply. RPKMs produced by cufflink look normal (from an 
assembled transcript file):


Seqname Source  Feature Start   End     Score   Strand  Frame   Attributes
chr1    Cufflinks       transcript      11960   13178   1000    .       .       
gene_id "CUFF.180"; transcript_id "CUFF.180.1"; FPKM "6.5441928094"; frac 
"1.000000"; conf_lo "3.594986"; conf_hi "8.987465"; cov "2.413218"; 
full_read_support "yes";
chr1    Cufflinks       exon    11960   13178   1000    .       .       gene_id 
"CUFF.180"; transcript_id "CUFF.180.1"; exon_number "1"; FPKM "6.5441928094"; 
frac "1.000000"; conf_lo "3.594986"; conf_hi "8.987465"; cov "2.413218";
chr1    Cufflinks       transcript      4536    5314    1000    +       .       
gene_id "CUFF.178"; transcript_id "CUFF.178.1"; FPKM "11.0556332840"; frac 
"1.000000"; conf_lo "3.645830"; conf_hi "13.216134"; cov "4.076844"; 
full_read_support "no";
chr1    Cufflinks       exon    4536    4605    1000    +       .       gene_id 
"CUFF.178"; transcript_id "CUFF.178.1"; exon_number "1"; FPKM "11.0556332840"; 
frac "1.000000"; conf_lo "3.645830"; conf_hi "13.216134"; cov "4.076844";
chr1    Cufflinks       exon    4706    5095    1000    +       .       gene_id 
"CUFF.178"; transcript_id "CUFF.178.1"; exon_number "2"; FPKM "11.0556332840"; 
frac "1.000000"; conf_lo "3.645830"; conf_hi "13.216134"; cov "4.076844";
chr1    Cufflinks       exon    5174    5314    1000    +       .       gene_id 
"CUFF.178"; transcript_id "CUFF.178.1"; exon_number "3"; FPKM "11.0556332840"; 
frac "1.000000"; conf_lo "3.645830"; conf_hi "13.216134"; cov "4.076844";

I checked the chromosome names and I realized that the BAM outputs use lower 
cases for "RNAME", eg. "chr1" while my gff3 file uses initial capital letters 
for "seqId", eg "Chr1". Could this be the problem? What is the fastest way to 
convert the capital C in my gff3 file to lower case?

Thank you very much
Yang

----- 原始邮件 -----
发件人: "Jennifer Jackson" <[email protected]>
收件人: "Yang Bi" <[email protected]>, [email protected]
发送时间: 星期一, 2014年 1 月 13日 上午 10:56:39
主题: Re: [galaxy-user] all FPKMs are 0 in the tmap files produced by cuffcompare

Hello,

It looks like the data is mapping as novel - not linked with the 
reference annotation. There can be a few factors that can cause this to 
occur for part of a dataset (often desirable) but when it occurs for an 
entire dataset, there is often a data mismatch or parameter issue.

The first item I always check is that the reference genomes are a match 
between inputs. Do this by confirming that the identifiers in the 
reference GFF file are the same as those in the Tophat BAM output 
(convert to SAM, with headers, to see the chromosome names). For the GFF 
file, the tool " Join, Subtract and Group -> Group" on the first column, 
chromosome name, with the action "count distinct" will isolate these.

But the real problem could be in the parameters, see below:

On 1/11/14 10:43 PM, Yang Bi wrote:
> Dear all:
>
> I am new to Galaxy and I followed online tutorials/tips to analyze my RNA seq 
> data for alternative splicing. I used "tophat for illumina" to align my 
> sequencing data after QC/filtering. Other than setting min intron to 20, I 
> used the default settings. Then I feed the accepted hit files to cufflink. I 
> set Min isoform fraction to 0, use annotation (tair10 gff3) as guide and 
> choose yes for perform bias correction (locally cached tair10).
My guess is that this Cufflinks run had the same issue - have you 
checked it? The 'Min isoform fraction' set to "0" may be problematic (I 
have never run Cufflinks this way). It may seem that this is a setting 
that is permissive - to capture even very small expression levels - but 
it may have had the reverse effect of not assigning any reads.

(The Tophat run with min intron at 20 is pretty low/sensitive - but with 
a smaller genome this probably will not cause memory issues with the 
mapping. Was this set based on the genome having transcripts with known, 
characterized introns this short? I didn't check, but you can in the 
reference GFF file.).

Maybe double check the above Cufflinks run, confirm the results were as 
expected, then try the default in Cufflinks to see how that works out 
("0.1")? As a first pass test? If you want to make this more sensitive 
in subsequent run, you could try "0.01" - although how significant those 
results are, given this genome and your specific input data, would need 
to be evaluated.

After that, if you are still having trouble, please feel free to share a 
history link and we can try to help (copy and email a share link from 
the public server, direct to me, to keep your data private). Here is how:
https://wiki.galaxyproject.org/Support#Shared_and_Published_data

Hopefully the parameter change works, or a reference genome issue is 
found and corrected, but if not, I'll watch for your email,

Jen
Galaxy team

> I merged the assembled transcripts with cuffmerge and use cuffcompare to 
> compare the resultant merged assembled transcript to the reference annotation 
> file tair10 gff3. I choose yes for "use sequence data" and locally cached 
> tair10 as the "reference list". I get this for the transcript accuracy 
> analysis:
>
> # Cuffcompare v2.1.1 | Command line was:
> #cuffcompare -o cc_output -r 
> /galaxy-repl/main/files/007/386/dataset_7386886.dat -s 
> /galaxy/data/Arabidopsis_thaliana_TAIR10/sam_index/Arabidopsis_thaliana_TAIR10.fa
>  ./input1
> #
>
> #= Summary for dataset: ./input1 :
> #     Query mRNAs :   72778 in   51779 loci  (57559 multi-exon transcripts)
> #            (12679 multi-transcript loci, ~1.4 transcripts per locus)
> # Reference mRNAs :   42163 in   33350 loci  (30127 multi-exon)
> # Corresponding super-loci:          33140
> #--------------------|   Sn   |  Sp   |  fSn |  fSp
>          Base level:  100.0    62.7     -       -
>          Exon level:  104.6    59.5   100.0    60.5
>        Intron level:  100.0    55.5   100.0    56.5
> Intron chain level:    98.3    51.5   100.0    60.3
>    Transcript level:   98.7    57.2    94.8    54.9
>         Locus level:   99.4    64.0   100.0    64.1
>
>       Matching intron chains:   29618
>                Matching loci:   33147
>
>            Missed exons:       1/169820       (  0.0%)
>             Novel exons:  128021/298149       ( 42.9%)
>          Missed introns:       0/127896       (  0.0%)
>           Novel introns:  102614/230568       ( 44.5%)
>             Missed loci:       1/33350        (  0.0%)
>              Novel loci:    2962/51779        (  5.7%)
>
>   Total union super-loci across all input datasets: 51779
>
> For the tmap file, all my FPKMs are 0:
>
> ref_gene_id   ref_id  class_code      cuff_gene_id    cuff_id FMI     FPKM    
> FPKM_conf_lo    FPKM_conf_hi    cov     len     major_iso_id    ref_match_len
> AT1G01010     AT1G01010.1     =       AT1G01010       TCONS_00000001  0       
> 0.000000        0.000000        0.000000        0.000000        1688    
> TCONS_00000001  1688
> AT1G01040     AT1G01040.1     =       AT1G01040       TCONS_00000002  0       
> 0.000000        0.000000        0.000000        0.000000        6251    
> TCONS_00000002  6251
> AT1G01040     AT1G01040.2     =       AT1G01040       TCONS_00000003  0       
> 0.000000        0.000000        0.000000        0.000000        5877    
> TCONS_00000002  5877
> AT1G01046     AT1G01046.1     =       AT1G01046       TCONS_00000004  0       
> 0.000000        0.000000        0.000000        0.000000        207     
> TCONS_00000004  207
> AT1G01073     AT1G01073.1     =       AT1G01073       TCONS_00000005  0       
> 0.000000        0.000000        0.000000        0.000000        111     
> TCONS_00000005  111
> AT1G01110     AT1G01110.2     =       AT1G01110       TCONS_00000006  0       
> 0.000000        0.000000        0.000000        0.000000        1782    
> TCONS_00000006  1782
> AT1G01110     AT1G01110.1     =       AT1G01110       TCONS_00000007  0       
> 0.000000        0.000000        0.000000        0.000000        1439    
> TCONS_00000006  1439
> AT1G01115     AT1G01115.1     =       AT1G01115       TCONS_00000008  0       
> 0.000000        0.000000        0.000000        0.000000        117     
> TCONS_00000008  117
> AT1G01160     AT1G01160.1     =       AT1G01160       TCONS_00000009  0       
> 0.000000        0.000000        0.000000        0.000000        1045    
> TCONS_00000010  1045
> AT1G01160     AT1G01160.2     =       AT1G01160       TCONS_00000010  0       
> 0.000000        0.000000        0.000000        0.000000        1129    
> TCONS_00000010  1129
> AT1G01180     AT1G01180.1     =       AT1G01180       TCONS_00000011  0       
> 0.000000        0.000000        0.000000        0.000000        1176    
> TCONS_00000011  1176
> AT1G01210     AT1G01210.1     =       AT1G01210       TCONS_00000012  0       
> 0.000000        0.000000        0.000000        0.000000        616     
> TCONS_00000012  616
> AT1G01220     AT1G01220.1     =       AT1G01220       TCONS_00000013  0       
> 0.000000        0.000000        0.000000        0.000000        3532    
> TCONS_00000013  3532
>
> The FPKMs were normal in the assembled trancripts produced by cufflink.
>
> Please enlighten me on the possible mistakes that i have made. I really 
> appreciate your help.
>
> Best
> Yang
> ___________________________________________________________
> The Galaxy User list should be used for the discussion of
> Galaxy analysis and other features on the public server
> at usegalaxy.org.  Please keep all replies on the list by
> using "reply all" in your mail client.  For discussion of
> local Galaxy instances and the Galaxy source code, please
> use the Galaxy Development list:
>
>    http://lists.bx.psu.edu/listinfo/galaxy-dev
>
> To manage your subscriptions to this and other Galaxy lists,
> please use the interface at:
>
>    http://lists.bx.psu.edu/
>
> To search Galaxy mailing lists use the unified search at:
>
>    http://galaxyproject.org/search/mailinglists/

-- 
Jennifer Hillman-Jackson
http://galaxyproject.org

___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-user] all FPKMs are 0 in the tmap files produced by cuffcompare

Reply via email to