Hi Guys

I did post the same thing on seqanswers couple of days but dint get a response. 
May be you guys can educate me on this.

I am trying to calculate RPKM on the tophat data but have come across this 
issue that I believe could skew my results.

My #input reads to tophat are ~49 million. The number of reads reported by 
tophat to be mapped are ~55 million. I assume I am getting more reads mapped 
than the total input due to the "--max-multihits 15" option I had set.  
"Instructs TopHat to allow up to this many alignments to the reference for a 
given read, and suppresses all alignments for reads with more than this many 
alignments." -> manual

Now for RPKM calculation I am not sure what number should I use for total 
mapped reads.

1. Total reads mapped by Tophat including multireads
2. Total uniquely mapped reads

If I go with #2 then I think I should also remove all multi reads when I am 
doing the counting for reads mapping to my genes which could eliminate RPKM 
count for paralogous genes.


What do you think is my best bet in order to get #total_mapped_reads.

Thanks!
-Abhi

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Reply via email to