Thanks for the clarification. How do I filter the reads that occur exactly once?
________________________________ From: Martin Morgan <[email protected]> Cc: [email protected] Sent: Wed, January 27, 2010 5:29:00 PM Subject: Re: [Bioc-sig-seq] unique reads count On 01/27/2010 02:57 PM, joseph wrote: > If I understand this correctly, !duplicated does not count unique reads: > !duplicated(c("A", "B", "B", "C")) > [1] TRUE TRUE FALSE TRUE > > in your example, there are only two uniques (not 3 as counted by !duplicated). Well, it depends on what your definition of 'duplicate' is, but yes, I think we agree. > Is it correct to say that the number of unique reads is given by the nReads > when nOccurrences=1 using the tables function? again unique needs definition (see unique(c("A", "B", "B", "C")), for instance!) but yes, nOccurrences is the number of reads that occur exactly once. Martin > > > ________________________________ > From: Martin Morgan <[email protected]> > Cc: [email protected] > Sent: Wed, January 27, 2010 1:06:55 PM > Subject: Re: [Bioc-sig-seq] unique reads count > > Hi Joseph -- > > On 01/27/2010 12:33 PM, joseph wrote: >> Hello >> I have a ShortReadQ object: >>> rfq >> class: ShortReadQ >> length: 16115723 reads; width: 34 cycles >> >> I used the negation of the result from srduplicated to count the unique >> reads: >>> sum(!srduplicated(sread(rfq))) >> [1] 4545719 >> >> But also I looked at the frequency with which each read occurs using the >> tables function: >>> head(tables(rfq_s_3_mel)$distribution) >> nOccurrences nReads >> 1 1 4022038 >> 2 2 255649 > > srduplicated is behaving like 'duplicated', which is to return TRUE when > an element has already been seen > >> duplicated(c("A", "B", "B", "C")) > [1] FALSE FALSE TRUE FALSE > > There's one duplicate, the second 'B'. > > After example(srduplicated) I have > >> tables(sread(rfq))$distribution > nOccurrences nReads > 1 1 239 > 2 2 7 > 3 3 1 >> sum(srduplicated(sread(rfq))) > [1] 9 > > there are 7 reads that are the second of two reads, and 2 reads that are > the second and third of three reads. > > Martin > >> >> I expected that for nOccurrences=1, the nReads should be the same as what I >> got with !srduplicated. >> >> Can anybody explain why I got different counts? >> Thank you >> Joseph Dhahbi >> >> >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioc-sig-sequencing mailing list >> [email protected] >> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing > > -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793 [[alternative HTML version deleted]] _______________________________________________ Bioc-sig-sequencing mailing list [email protected] https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
