Hi List,
I have a user running a Tuxedo pipeline on our local Galaxy but it has been
fraught with errors.
The first issue was with running cufflinks with an annotation file which had
duplicate IDs - fixed by me running the following:
awk '($3 == "exon" || $3 == "CDS")' dataset_8640.dat >> newref
Just tossing out the transcript and gene lines seemed to help.
Second issue was Error: sequence lines in a FASTA record must have the same
length!
Converting to and from tabular fixed that.
Now the issue is with cuffmerge. The user has run Tophat on a pair of fastq
files and a fasta genome. They then ran cufflinks on the Tophat assembly with
an annotation file (not the previously mentioned one). This worked. But using
the gtf file produced by cufflinks in a cuffmerge step results in:
Error running cuffmerge.
[Sat Mar 22 14:00:10 2014] Beginning transcriptome assembly merge
-------------------------------------------
[Sat Mar 22 14:00:10 2014] Preparing output location cm_output/
[Sat Mar 22 14:02:37 2014] Converting GTF files to SAM
[14:02:38] Loading reference annotation.
Error: duplicate GFF ID 'CDS:GBG_brugia_K07A12.4b' encountered!
[FAILED]
Error: could not execute gtf_to_sam
I took out the sequence and annotation file in the cuffmerge step with no
change in result. I ran gffread on the cufflinks output and sure enough, it
explodes. But why would Cufflinks create an invalid file?
The file itself has these entries:
Bmal_v3_scaffold1 Cufflinks transcript 280476 280630 1 + .
gene_id ""; transcript_id "CDS:GBG_brugia_K07A12.4b"; FPKM "0.0000000000"; frac
"0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
Bmal_v3_scaffold1 Cufflinks exon 280476 280630 1 + . gene_id "";
transcript_id "CDS:GBG_brugia_K07A12.4b"; exon_number "1"; FPKM "0.0000000000";
frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
Bmal_v3_scaffold1 Cufflinks transcript 281149 281207 1 + .
gene_id ""; transcript_id "CDS:GBG_brugia_K07A12.4b"; FPKM "0.0000000000"; frac
"0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
Bmal_v3_scaffold1 Cufflinks exon 281149 281207 1 + . gene_id "";
transcript_id "CDS:GBG_brugia_K07A12.4b"; exon_number "1"; FPKM "0.0000000000";
frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
# Shortening for brevity:
Bmal_v3_scaffold1 Cufflinks transcript 281521 281622 1 ...
Bmal_v3_scaffold1 Cufflinks exon 281521 281622 1 ...
Bmal_v3_scaffold1 Cufflinks transcript 281743 281863 1 ...
Bmal_v3_scaffold1 Cufflinks exon 281743 281863 1 ...
Bmal_v3_scaffold1 Cufflinks transcript 282355 282537 1 ..
Bmal_v3_scaffold1 Cufflinks exon 282355 282537 1 ...
Bmal_v3_scaffold1 Cufflinks transcript 283063 283190 1 ...
Bmal_v3_scaffold1 Cufflinks exon 283063 283190 1 ...
Bmal_v3_scaffold1 Cufflinks transcript 283879 284035 1 ...
Bmal_v3_scaffold1 Cufflinks exon 283879 284035 1 ...
Bmal_v3_scaffold1 Cufflinks transcript 280652 280683 1 ...
Bmal_v3_scaffold1 Cufflinks exon 280652 280683 1 ...
Here's my setup:
Galaxy changeset: dc067a95261d was my last pull
$ cuffmerge --version
merge_cuff_asms v1.0.0
$ cufflinks
cufflinks v2.2.0
linked against Boost version 104700
$ tophat --version
TopHat v1.3.3
Using tools:
Cuffdiff devteam revision: 604fa75232a2
Cufflinks devteam revision: 9aab29e159a7
Cuffmerge devteam revision: 424d49834830
Tophat devteam revision: 1030acbecce6
Has anyone else seen this? I'm re-running the workflow from scratch but I don't
really have any leads.
Sincerely,
Carrie Ganote
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client. To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at:
http://galaxyproject.org/search/mailinglists/