Hi, On Wed, Apr 28, 2010 at 3:12 PM, Pratap, Abhishek <[email protected]> wrote: > Hi Guys > > I did post the same thing on seqanswers couple of days but dint get a > response. May be you guys can educate me on this. > > I am trying to calculate RPKM on the tophat data but have come across this > issue that I believe could skew my results. > > My #input reads to tophat are ~49 million. The number of reads reported by > tophat to be mapped are ~55 million. I assume I am getting more reads mapped > than the total input due to the "--max-multihits 15" option I had set. > "Instructs TopHat to allow up to this many alignments to the reference for a > given read, and suppresses all alignments for reads with more than this many > alignments." -> manual > > Now for RPKM calculation I am not sure what number should I use for total > mapped reads. > > 1. Total reads mapped by Tophat including multireads > 2. Total uniquely mapped reads > > If I go with #2 then I think I should also remove all multi reads when I am > doing the counting for reads mapping to my genes which could eliminate RPKM > count for paralogous genes. > > > What do you think is my best bet in order to get #total_mapped_reads.
It sounds like what you propose is reasonable in either way, and yes, if you go with #2, I would remove multireads when counting for RPKM. Also, if you go with #2, you might want to ensure that your K is calculate from the number of uniquely mappable positions in your gene model, just so you keep same w/ same. Why don't you try calculating RPKM using both 1 and 2, then plot the expression of gene x from #1 vs. its expression from #2. I suspect the plot you get will be pretty close to the diagonal, but you never know unless you try. Let us know :-) -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact _______________________________________________ Bioc-sig-sequencing mailing list [email protected] https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
