Hi Guys
I did post the same thing on seqanswers couple of days but dint get a response.
May be you guys can educate me on this.
I am trying to calculate RPKM on the tophat data but have come across this
issue that I believe could skew my results.
My #input reads to tophat are ~49 million. The number of reads reported by
tophat to be mapped are ~55 million. I assume I am getting more reads mapped
than the total input due to the "--max-multihits 15" option I had set.
"Instructs TopHat to allow up to this many alignments to the reference for a
given read, and suppresses all alignments for reads with more than this many
alignments." -> manual
Now for RPKM calculation I am not sure what number should I use for total
mapped reads.
1. Total reads mapped by Tophat including multireads
2. Total uniquely mapped reads
If I go with #2 then I think I should also remove all multi reads when I am
doing the counting for reads mapping to my genes which could eliminate RPKM
count for paralogous genes.
What do you think is my best bet in order to get #total_mapped_reads.
Thanks!
-Abhi
[[alternative HTML version deleted]]
_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing