[galaxy-dev] Cufflinks creates invalid output with duplicate GFF IDs

Ganote, Carrie L Mon, 14 Apr 2014 12:48:07 -0700

Hi List,

I have a user running a Tuxedo pipeline on our local Galaxy but it has been 
fraught with errors.


The first issue was with running cufflinks with an annotation file which had 
duplicate IDs - fixed by me running the following:
awk '($3 == "exon" || $3 == "CDS")' dataset_8640.dat >> newref
Just tossing out the transcript and gene lines seemed to help.

Second issue was Error: sequence lines in a FASTA record must have the same 
length!
Converting to and from tabular fixed that.

Now the issue is with cuffmerge. The user has run Tophat on a pair of fastq 
files and a fasta genome. They then ran cufflinks on the Tophat assembly with 
an annotation file (not the previously mentioned one). This worked. But using 
the gtf file produced by cufflinks in a cuffmerge step results in:
Error running cuffmerge. 
[Sat Mar 22 14:00:10 2014] Beginning transcriptome assembly merge
-------------------------------------------

[Sat Mar 22 14:00:10 2014] Preparing output location cm_output/
[Sat Mar 22 14:02:37 2014] Converting GTF files to SAM
[14:02:38] Loading reference annotation.
Error: duplicate GFF ID 'CDS:GBG_brugia_K07A12.4b' encountered!
        [FAILED]
Error: could not execute gtf_to_sam
I took out the sequence and annotation file in the cuffmerge step with no 
change in result. I ran gffread on the cufflinks output and sure enough, it 
explodes. But why would Cufflinks create an invalid file?

The file itself has these entries:
Bmal_v3_scaffold1   Cufflinks   transcript   280476  280630  1   +   .   
gene_id ""; transcript_id "CDS:GBG_brugia_K07A12.4b"; FPKM "0.0000000000"; frac 
"0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
Bmal_v3_scaffold1   Cufflinks   exon   280476  280630  1   +   .   gene_id ""; 
transcript_id "CDS:GBG_brugia_K07A12.4b"; exon_number "1"; FPKM "0.0000000000"; 
frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";

Bmal_v3_scaffold1   Cufflinks   transcript   281149  281207  1   +   .   
gene_id ""; transcript_id "CDS:GBG_brugia_K07A12.4b"; FPKM "0.0000000000"; frac 
"0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
Bmal_v3_scaffold1   Cufflinks   exon   281149  281207  1   +   .    gene_id ""; 
transcript_id "CDS:GBG_brugia_K07A12.4b"; exon_number "1"; FPKM "0.0000000000"; 
frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";

# Shortening for brevity:
Bmal_v3_scaffold1   Cufflinks   transcript   281521  281622  1  ...
Bmal_v3_scaffold1   Cufflinks   exon    281521  281622  1  ...

Bmal_v3_scaffold1   Cufflinks   transcript   281743  281863  1  ...
Bmal_v3_scaffold1   Cufflinks   exon   281743  281863  1  ...

Bmal_v3_scaffold1   Cufflinks   transcript   282355  282537  1  ..
Bmal_v3_scaffold1   Cufflinks   exon   282355  282537  1  ...

Bmal_v3_scaffold1   Cufflinks   transcript   283063  283190  1  ...
Bmal_v3_scaffold1   Cufflinks   exon   283063  283190  1  ...

Bmal_v3_scaffold1   Cufflinks   transcript   283879  284035  1  ...
Bmal_v3_scaffold1   Cufflinks   exon   283879  284035  1  ...

Bmal_v3_scaffold1   Cufflinks   transcript   280652  280683  1  ...
Bmal_v3_scaffold1   Cufflinks   exon   280652  280683  1  ...

Here's my setup:
Galaxy changeset: dc067a95261d was my last pull

$ cuffmerge --version
merge_cuff_asms v1.0.0 
$ cufflinks 
cufflinks v2.2.0
linked against Boost version 104700
$ tophat --version
TopHat v1.3.3

Using tools:
Cuffdiff devteam revision: 604fa75232a2
Cufflinks devteam revision: 9aab29e159a7
Cuffmerge devteam revision: 424d49834830
Tophat devteam revision: 1030acbecce6 

Has anyone else seen this? I'm re-running the workflow from scratch but I don't 
really have any leads.

Sincerely,

Carrie Ganote
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

[galaxy-dev] Cufflinks creates invalid output with duplicate GFF IDs

Reply via email to