You most likely want to explore tools that are designed specifically for this purpose, if the reference genome you are talking about is the assembled transcriptome. Trinity is one tool, but there are others in the Tool Shed and on some of the Public Servers.


Your question is a bit confusing because the 'annotations' may already be what these tools would produce and I am not sure what you are trying to do next. If it is the assignment of putative function, then there are many paths to follow, some better suited for viral genomes. You'll want to find out what others doing this exact work are using right now and consider the same tools. Start by checking out the public Galaxy servers, many have trial tools that you can later include in a local/cloud from the tool shed: http://wiki.galaxyproject.org/PublicGalaxyServers

If your question was misunderstood (the reference genome is in fact a DNA genome - and you have RNA sequence to align), then the RNA-seq pipeline can be used as-is with 'Tophat for SOLiD', Cufflinks, CuffMerge, CuffDiff - all on a local/cloud/slipstream with the reference genome as a cluster reference genome. There is no requirement for reference annotation with any of these tool - it helps to gain full functionality - especially with CuffDiff, but is not required. More assistance is at tophat.cuffli...@gmail.com.

So I am trying to find a novel gene using de novo tramscriptome assembly and I see that TopHat might just be able to help me out with my dilemma. The viral genome not available on the galaxy website, and the other issue is that I am using SOLID data. So my question is, can I use TopHat with SOLID data by converting to nucleotide base fastq? or do I have to use TopHat2 with a colourspace viral genome? I also have to admit that I am completely new to bioinformatics and my project as lead me here so I am trying to tackle it on my own.

Fo the custom genome, I have managed to load it (in fasta, and annotation in BED) but I am not sure how to assign the annotations to the genome. Also, does TopHat require an annotated genome? I read that it doesn't but I'm not sure...I fear that my gene is a spliced one and I would like to be able to pull it out from output data.

