I am running tophat and cufflinks on a bacterial genome.
As parameters for tophat, I used minimal distance between introns as 15bp, and max intron size as 1500bp. Visual verification of this looks decent. What I mean by this is that when I look at the splice junctions, not many are identified (I do not expect many introns in my genome) although there are a few false ones, that seem to connect two different genes. This is one thing I would like help with- is it worth simply reducing to nothing the max intron size? What is accepted consensus when using tophat on bacterial genomes?
When I look at the second tophat file, of accepted hits, all hits
align nicely with known genes. However, when I run cufflinks I
run into the following issues: when I use a reference genome, I
get in addition to the known transcripts, a bunch of very long
transcripts spanning very large genomic regions. Also, I will have
two genes that are very near each other but run in opposite
directions (which you can see beautifully in the tophat accepted
hits alignments - different colors for each strand) but they merge
into a single CUFF identifier. Is there any way I can address
this- is it something I am missing with respect to parameters I
have to change because I am working on a bacterial genome?
___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/