The picard library (java-based) is a very useful library for doing this type of thing. This can be done in R, but the picard folks have put a lot of thought into how to find and mark duplicates including optical duplicates. This is particularly true if you have paired-end data.
Sean On Thu, Aug 11, 2011 at 12:50 PM, Kunbin Qu <k...@genomichealth.com> wrote: > Hi, I have some human single end RNA-seq runs on HiSeq. Can I have some > suggestions on how to assess how many duplicated reads out of these > libraries? I looked around srFilter() in ShortRead, but have not had a clear > thought on how to implement it? Should I use IRanges as an alternative to > assess the unique starting site after the mapping? If so, what function do > you suggest? I'd like to count reads which map to the same location (even > with some mismatches) as duplicates. Thanks. > > -Kunbin > > > > ______________________________________________________________________ > The contents of this electronic message, including any attachments, are > intended only for the use of the individual or entity to which they are > addressed and may contain confidential information. If you are not the > intended recipient, you are hereby notified that any use, dissemination, > distribution, or copying of this message or any attachment is strictly > prohibited. If you have received this transmission in error, please send an > e-mail to postmas...@genomichealth.com and delete this message, along with > any attachments, from your computer. > [[alternative HTML version deleted]] > > _______________________________________________ > Bioc-sig-sequencing mailing list > Bioc-sig-sequencing@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing > _______________________________________________ Bioc-sig-sequencing mailing list Bioc-sig-sequencing@r-project.org https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing