Hi all, Sorry for abusing the list (and *-seq terminology) as this isn't really a Bioconductor-related question, but I was curious how you all deal with "pileups" in RNAseq data. By pileup I mean separate observations of the same read (ie. two++ different reads that map to the same exact genomic locus), aka duplicate reads.
I'm pretty sure it's common practice to remove them in ChIP-seq experiments since, I believe, they are usually assumed to be PCR artifacts, but with genes being able to vary in their expression level, removing all of them probably isn't a given. That having been said, I have been removing them anyway. I think I've seen some references to only keep N-many reads that map to the same place, where N seems to be arbitrarily chosen at a global scale. I guess it makes the most sense to probably determine N on a gene-by-gene basis, perhaps by quantifying the expression of the gene based on its uniquely-appearing reads, though. So, I'm just curious if/how you folks are tackling this issue. Thanks, -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact _______________________________________________ Bioc-sig-sequencing mailing list [email protected] https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
