On 31-Aug-09 19:16:33, Erik Iverson wrote: > Dear R-help, > Could someone please try to explain this paradox to me? What is > more likely to show up first in a string of coin tosses, "Heads > then Tails", or "Heads then Heads"? > >##generate 2500 strings of random coin flips > ht <- replicate(2500, > paste(sample(c("H", "T"), 100, replace = TRUE), > collapse = "")) > >## find first occurrence of HT > mean(regexpr("HT", ht))+1 #mean of HT position, 4 > >## find first occurrence of HH > mean(regexpr("HH", ht))+1 #mean of HH position, 6 > > FYI, this is not homework, I have not been in school in years. > I saw a similar problem posed in a blog post on the Revolutions R > blog, and although I believe the answer, I'm having a hard time > figuring out why this should be? > > Thanks, > Erik Iverson
Be very careful about the statement of the problem! [1] The probability that "HH" will occur first (i.e. before "HT") is the same as the probability that "HT" will occur first (i.e. before "HH"). [2] However, the probability that the first occurrence of "HT" will be on a given position of the "H" is generally not the same as the probability that the first occurrence of "HH" will be on the same position of the first "H". [1]: At the first occurrence of (either "HH" or "HT"), there is an initial string S, ending in an "H", followed by either an "H" (for "HH") or a "T" (for "HT"). Both are equally likely. So the probability that the first occurrence of (either "HH" or "HT") is an ""HH" is the same as the probability that it is an "HT". [2]: (A) the first occurrence of an "HH" is in a sequence of any collection of "H" and "T" provided there is no "HH" in the sequence, and the last is "H", followed by "H". However, "HT" is allowed to occur in the sequence. But (B) the first occurrence of an "HT" is in a sequence of (zero or more "T") followed by (1 or more "H") followed by "T". This is the only pattern in which "HT" does not occur prior to the final "HT". Similarly, "HH" is allowed to pccur in the sequence. The reason that, in general, the probability of "HH" first occuring at a given position is different from the probability if "HT" first occurring at that position lies in the differences between the number of possible sequences satisfying (A), and the number of possible sequences satisfying (B). The first few cases ("HH" or "HT" first occurring at (k+1), so that the position of the first "H" in "HH" or "HT" is at k) are, with their probabilities: k=1: HH HT 1/4 1/4 K=2: THH HHT THT 1/8 2/8 k=3: TTHH HHHT HTHH THHT TTHT 2/16 3/16 k=4: TTTHH HHHHT THTHH THHHT HTTHH TTHHT TTTHT 3/32 4/32 The "HT" case is simple: P.HT[k] = Prob(1st "HT" at (k+1)) = k/(2^(k+1)) Exercise for the reader: Sum(P.HT) = 1 The "HH" case is more interesting. Experimental scribblings on parer threw up an hypothesis, which I decided to explore in R. Thanks to Gerrit Eichner for suggestion the use of expand.grid()! ## Function to count sequences giving 1st HH on throw k+1 countHH <- function(k){ M <- as.matrix(expand.grid(rep(list(0:1),k))) ix <- (M[,k]==1) ## k must be an H (then k+1 will be H) for(i in (1:(k-1))){ ix<-ix&( !((M[,i]==1)&(M[,i+1]==1)) ) } sum(ix) ## list(Count=sum(ix),Which=M[ix,]) } Now, ignoring the case k=1: HHcounts <- NULL for(i in (2:12)){ HHcounts<-c(HHcounts,countHH(i)) } rbind((3:13),HHcounts) # [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] # 3 4 5 6 7 8 9 10 11 12 13 #HHcounts 1 2 3 5 8 13 21 34 55 89 144 Lo and Behold, we have a Fibonnaci sequence! Another exercise for the reader ... Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <ted.hard...@manchester.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 01-Sep-09 Time: 10:38:58 ------------------------------ XFMail ------------------------------ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.