Thanks for the reply. I tried to use the script provided on a previous
galaxy thread for adding the chr on to the gtf file on the mac terminal but
I keep getting this error -

awk: can't open file ensembl.gtf
 source line number 1

I am very new to using the terminal so please let me know if there is
something basic that I am not doing right,


On Tue, Jun 28, 2011 at 6:13 AM, Jeremy Goecks <>wrote:

> Hello Kurinji,
> I was at your USC Galaxy seminar last week, which I found very helpful -
> thank you!
> Glad to hear that you found the workshop helpful. As a reminder, please
> email questions about using Galaxy and its tools to the galaxy-user mailing
> list (which I've cc'd). You may get quicker and different responses from
> community members, and everyone will benefit from the discussion.
> I used my recently generated RNAseq data in Galaxy (which was pre-aligned
> using tophat and already had cufflinks run on it) - I ran cuffcompare with
> all the gtf files and then cuffdiff for the three pairs (there is 1 control
> and 3 different drug treatments - no replicates). I got several output
> files, as expected, but decided just to look at the gene differential
> expression as a start. Some questions I have are -
> 1. (very basic question!) which is sample 1 (and corresponding value 1) and
> sample 2 (and corresponding value 2)in my output file. This is what my
> output file is called -
> 90: Cuffdiff on data 37, data 38, and data 60: gene differential expression
> testing 33,969 lines
> Is 37 sample one or sample two? Given the data - I would expect sample 37
> to correspond to "value 2" - but I could be wrong. Please let me know!
> The best way to figure out which dataset corresponds with Cuffdiff's labels
> is to click the rerun button in the dataset: sample names correspond
> directly to the reads datasets (i.e. BAM files) provided as input to
> Cuffdiff.
> 2. How do I find the UCSC gene names corresponding with start/end sites - I
> did input the hg18 UCSC gtf file as a reference
> You'll need to use a reference annotation (GTF file) that has the gene_name
> attribute as input for Cufflinks/compare/difff. Typically Ensembl
> annotations have this attribute; however, you'll need to prepend 'chr' to
> each line--really, to each chromosome name--in order to bring Ensembl
> notation in line with UCSC/Galaxy notation.
> Actually, I noticed that value 1 in this particular output file is all 0 -
> no idea why. It is not this way in the other files, making me wonder if
> there is an error somewhere. I am sure the bam file is okay as I viewed it
> on IGV and saw the patterns I would expect for some candidate genes I looked
> at.
> It's difficult for me to comment without seeing your analysis. Some output
> files depend on particular attributes being set correctly in the annotation
> file. You may want to search through our mailing list archives and see if
> your question has already been answered:
> Good luck,
> J.
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

Reply via email to