Re: [R] can you tell what .Random.seed *was*?
Duncan Murdoch wrote: 1) can you tell me what my original set.seed() value was? (I wouldn't be able to figure it out, but maybe someone can) The only way I know is to test all 2^32 possible values of the seed. I think cryptographers would know faster ways. Well, I'm not a cryptographer, but I know a faster way: rainbow tables. http://en.wikipedia.org/wiki/Rainbow_table Given that the algorithm to generate these images is known and that each seed always gives the same image as output, you can simply precompute all possible images, hash them using your favorite algorithm -- say, SHA-256 -- and record the seed-to-hash correspondence on disk. Then given an output image, you can hash it and use that to look up the seed. It can take a long time to generate all the images, but then you have database like lookup speeds for image-to-seed correspondence. This is not just a theoretical idea. There are underground sites where you can put in, say, an MD5 password hash and get out the likely password that was actually used. This allows a black hat to break into one site, grab their password hash database, reverse engineer the passwords, and then go use them to bang on the front door of other sites users of that site he first compromised also use. There are defenses against this: salting the passwords and using passwords too big to appear in rainbow tables are easiest. Now, if the seed was removed just before the values were generated, the seed would be generated from the system clock. If you knew the time that this occurred approximately, the search could be a lot faster. This also helps with the rainbow table approach. Given that the seed for the generation algorithm is always the current wall time, you can restrict the needed rainbow table size greatly. You simply have to know when the algorithm was first put into use, then start your rainbow table with that time's value as the first seed, and only compute up to now plus whatever you need for future operations. For instance, you can cover about 3 years worth of image production in about 1/45 the time as it takes to cover all 2^32 possible images. Say it takes a month to generate a rainbow table covering those 3 years. Full coverage would then take nearly 4 years. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] can you tell what .Random.seed *was*?
On Thu, May 14, 2009 at 3:36 PM, G. Jay Kerns gke...@ysu.edu wrote: set.seed(something) x - rnorm(100) y - runif(500) # bunch of other stuff ... Now, I give you a copy of my script.R (with the set.seed statement removed, of course) together with the .RData file that was generated by the save.image() command. ... 1) can you tell me what my original set.seed() value was?... 2) is it possible *in principle* to figure out what set.seed was, given the above? Set.seed takes an integer argument, that is, 2^32-1 distinct values (cf NA_integer_), so the very simplest approach, brute-force search, has a hope of working: whatseed - function (v) { i - as.integer(-2^31+1); max - as.integer(2^31-1) while (imax) { set.seed(i); if (runif(1)==v) return(i); i-i+1 } } (OK, being able to figure it out in 2*10^68 years doesn't count, but within a couple months is acceptable.) set.seed(-2^31+10) system.time(whatseed(runif(1))) user system elapsed 1.530.001.53 2^32*(1.53/10)/3600 = 18.25 18 hours 3) does the answer change if there is a remove(.Random.seed) command right before the save.image() command? Depending on which RNG algorithm (RNGkind) you use, there may be cryptographic techniques that are more efficient than brute-force search, especially if the full internal state (.Random.seed) is preserved. This all assumes that the seed is set *only* with set.seed. If .Random.seed is modified directly, there are many more possibilities for most of the RNGs. -s __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] can you tell what .Random.seed *was*?
Set.seed takes an integer argument, that is, 2^32-1 distinct values (cf NA_integer_), so the very simplest approach, brute-force search, has a hope of working: whatseed - function (v) { i - as.integer(-2^31+1); max - as.integer(2^31-1) while (imax) { set.seed(i); if (runif(1)==v) return(i); i-i+1 } } (OK, being able to figure it out in 2*10^68 years doesn't count, but within a couple months is acceptable.) set.seed(-2^31+10) system.time(whatseed(runif(1))) user system elapsed 1.53 0.00 1.53 2^32*(1.53/10)/3600 = 18.25 18 hours 3) does the answer change if there is a remove(.Random.seed) command right before the save.image() command? Depending on which RNG algorithm (RNGkind) you use, there may be cryptographic techniques that are more efficient than brute-force search, especially if the full internal state (.Random.seed) is preserved. This all assumes that the seed is set *only* with set.seed. If .Random.seed is modified directly, there are many more possibilities for most of the RNGs. -s Thanks very much to Warren and Stavros for their additional insight. Putting all of this together, I think I am now ready to formulate my question intelligently: Using Sweave, I want to distribute randomly generated problems AND answers to both teacher AND student. More precisely, I want to distribute: 1) the .Rnw file 2) the .RData file saved near the end of the Sweave process. I want it to be *easy* for the Instructor to change my seed and generate new problems. I want it to be *difficult* for students to figure out the seed and automatically generate solutions on their own. Of course, difficult is a relative term, since what is difficult for them may well be easy for me, and what is difficult for me will be trivial to cryptographers and some people on this list. The audience would be, say, upper division undergraduate students at a public university. What is clear so far: a brute force search of set.seed() is really pretty easy and fast... even for students at this level. However, relating to Duncan's second remark: what if the Instructor inserted an *unknown* very large number of calls to the RNG near the beginning of the .Rnw (but after the set.seed)... and did not distribute this information to the students... that would make it much harder, yes? Any ideas that are even better than this? Conceivably, some of my students will be searching these archives in the future; please feel free to respond off-list if appropriate. Jay -- *** G. Jay Kerns, Ph.D. Associate Professor Department of Mathematics Statistics Youngstown State University Youngstown, OH 44555-0002 USA Office: 1035 Cushwa Hall Phone: (330) 941-3310 Office (voice mail) -3302 Department -3170 FAX E-mail: gke...@ysu.edu http://www.cc.ysu.edu/~gjkerns/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] can you tell what .Random.seed *was*?
G. Jay Kerns wrote: I want it to be *difficult* for students to figure out the seed and automatically generate solutions on their own. Hmmm Would it really be a bad thing if someone reverse engineered this to generate answers given the problem set? If it's hard enough to do that, it'd be more worth solving than the given problem set. I call that extra credit. a brute force search of set.seed() is really pretty easy and fast... even for students at this level. Either you're misunderstanding Stavros' benchmark results, or I am. Could easily be the latter...I'm an R newbie. As far as I can tell, the inner part of the loop does very little. If that's right, Stavros is saying it will take 18 hours to try every possible seed when the algorithm based on that seed takes almost no time to run. But, if generating each problem set takes, say, a minute, it will take 4.7 million years to generate a complete rainbow table when there are 2^32 possible seeds. what if the Instructor inserted an *unknown* very large number of calls to the RNG near the beginning of the .Rnw (but after the set.seed)... and did not distribute this information to the students... that would make it much harder, yes? There are better ways. As above, one key to making rainbow tables impractical is making the per-iteration time long enough. Even if it only takes a second to generate each possible problem set, that's enough when multiplied by high enough powers of 2. The other key is using big enough powers of 2. I hadn't looked into R's random number generation before, but it appears quite robust. Seeding it with the current wall clock time (a 32-bit integer on most systems) is an insult to its capability. The default pseudo-random number generator (PRNG) in my copy of R is the Mersenne Twister, a truly awesome algorithm. It's capable of very high quality results, as long as you give it a good seed. It will take a vector of *many* integers as a seed, not just one. It's not clear to me from the R docs if you can pass an arbitrary array of integers with any value, or if it needs something special. Assuming you can give it any old passel of randomness as a seed, you just have to find a good source of randomness to create that seed. On a Linux box, you could concatenate several dozen bytes read from /dev/random, the current wall clock time in microseconds, the inode of the R script being run, the process ID of the R interpreter, and the current mouse cursor position into a single string. Feed all that into a hash algorithm, and break off pieces of that 4 bytes long, cast them to integers, and send that array of ints to set.seed(). If you use SHA-256 as the hash algorithm, that scheme should give you enough input randomness to get any of the possible 2^256 hash outputs, making that the amount of possible problem sets. That's more than a rainbow table buster...there aren't enough atoms in the visible universe to construct a computer big enough to cope with 2^256 possible outputs. That said, the quality of the PRNG just *allows* you to avoid screwing up. It doesn't make it impossible make a weak algorithm. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] can you tell what .Random.seed *was*?
On Fri, May 15, 2009 at 12:07 PM, Stavros Macrakis macra...@alum.mit.edu wrote: system.time(whatseed(runif(1))) Sorry, though I got lucky and my overall result is roughly correct, this is an incorrect time measure. It should be r - runif(1); system.time(whatseed(r)) because R's call-by-need semantics don't evaluate the runif before it starts running whatseed. The correct time (on my machine) is then 28 hours, not 18. Better to avoid side-effect functions as arguments -s __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] can you tell what .Random.seed *was*?
On 15 May 2009 at 13:08, G. Jay Kerns wrote: | Thanks very much to Warren and Stavros for their additional insight. | Putting all of this together, I think I am now ready to formulate my | question intelligently: | | Using Sweave, I want to distribute randomly generated problems AND | answers to both teacher AND student. | | More precisely, I want to distribute: | 1) the .Rnw file | 2) the .RData file saved near the end of the Sweave process. | | I want it to be *easy* for the Instructor to change my seed and | generate new problems. | | I want it to be *difficult* for students to figure out the seed and | automatically generate solutions on their own. | | Of course, difficult is a relative term, since what is difficult | for them may well be easy for me, and what is difficult for me will | be trivial to cryptographers and some people on this list. The | audience would be, say, upper division undergraduate students at a | public university. | | | What is clear so far: a brute force search of set.seed() is really | pretty easy and fast... even for students at this level. | | However, relating to Duncan's second remark: what if the Instructor | inserted an *unknown* very large number of calls to the RNG near the | beginning of the .Rnw (but after the set.seed)... and did not | distribute this information to the students... that would make it | much harder, yes? | | Any ideas that are even better than this? You could use (one or more) seeds from a hardware RNGs. The website http://random.org by Mads Haahr distributes such numbers (and my CRAN package 'random' gets them for you in a convenient fashion). Have a look at the docs at random.org, and the two vignettes in the random package: RANDOM.ORG offers true random numbers to anyone on the Internet. The randomness comes from atmospheric noise, which for many purposes is better than the pseudo-random number algorithms typically used in computer programs. People use RANDOM.ORG for holding drawings, lotteries and sweepstakes, to drive games and gambling sites, for scientific applications and for art and music. Hth, Dirk -- Three out of two people have difficulties with fractions. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] can you tell what .Random.seed *was*?
Dear R-help, Suppose I write a script that looks something like this: script.R set.seed(something) x - rnorm(100) y - runif(500) # bunch of other stuff save.image() ### end of script.R Now, I give you a copy of my script.R (with the set.seed statement removed, of course) together with the .RData file that was generated by the save.image() command. Question: 1) can you tell me what my original set.seed() value was? (I wouldn't be able to figure it out, but maybe someone can) 2) is it possible *in principle* to figure out what set.seed was, given the above? (OK, being able to figure it out in 2*10^68 years doesn't count, but within a couple months is acceptable.) 3) does the answer change if there is a remove(.Random.seed) command right before the save.image() command? Any thoughts would be appreciated. Cheers, Jay *** G. Jay Kerns, Ph.D. Associate Professor Department of Mathematics Statistics Youngstown State University Youngstown, OH 44555-0002 USA Office: 1035 Cushwa Hall Phone: (330) 941-3310 Office (voice mail) -3302 Department -3170 FAX E-mail: gke...@ysu.edu http://www.cc.ysu.edu/~gjkerns/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] can you tell what .Random.seed *was*?
On 5/14/2009 3:36 PM, G. Jay Kerns wrote: Dear R-help, Suppose I write a script that looks something like this: script.R set.seed(something) x - rnorm(100) y - runif(500) # bunch of other stuff save.image() ### end of script.R Now, I give you a copy of my script.R (with the set.seed statement removed, of course) together with the .RData file that was generated by the save.image() command. Question: 1) can you tell me what my original set.seed() value was? (I wouldn't be able to figure it out, but maybe someone can) The only way I know is to test all 2^32 possible values of the seed. I think cryptographers would know faster ways. 2) is it possible *in principle* to figure out what set.seed was, given the above? (OK, being able to figure it out in 2*10^68 years doesn't count, but within a couple months is acceptable.) I think the brute force search would take about 4-5 days on my old computer, so a fast one could do it in a few hours. 3) does the answer change if there is a remove(.Random.seed) command right before the save.image() command? Not with the brute force search. The cryptographers would probably be able to make use of the random seed pretty well. Now, if the seed was removed just before the values were generated, the seed would be generated from the system clock. If you knew the time that this occurred approximately, the search could be a lot faster. On the other hand, if there was an unknown large amount of work between the time of seed generation and the generation of the values x and y, it would be much more difficult to brute force the solution, because most of the values of .Random.seed are not reachable either by the clock method or set.seed, they are only reachable by lots of calls to the underlying generator. So for each seed value you'd need to do every possible number of RNG generations before generating x and y. Duncan Murdoch Any thoughts would be appreciated. Cheers, Jay *** G. Jay Kerns, Ph.D. Associate Professor Department of Mathematics Statistics Youngstown State University Youngstown, OH 44555-0002 USA Office: 1035 Cushwa Hall Phone: (330) 941-3310 Office (voice mail) -3302 Department -3170 FAX E-mail: gke...@ysu.edu http://www.cc.ysu.edu/~gjkerns/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] can you tell what .Random.seed *was*?
Dear Duncan, On Thu, May 14, 2009 at 4:39 PM, Duncan Murdoch murd...@stats.uwo.ca wrote: On 5/14/2009 3:36 PM, G. Jay Kerns wrote: Question: 1) can you tell me what my original set.seed() value was? (I wouldn't be able to figure it out, but maybe someone can) The only way I know is to test all 2^32 possible values of the seed. I think cryptographers would know faster ways. 2) is it possible *in principle* to figure out what set.seed was, given the above? (OK, being able to figure it out in 2*10^68 years doesn't count, but within a couple months is acceptable.) I think the brute force search would take about 4-5 days on my old computer, so a fast one could do it in a few hours. 3) does the answer change if there is a remove(.Random.seed) command right before the save.image() command? Not with the brute force search. The cryptographers would probably be able to make use of the random seed pretty well. Now, if the seed was removed just before the values were generated, the seed would be generated from the system clock. If you knew the time that this occurred approximately, the search could be a lot faster. On the other hand, if there was an unknown large amount of work between the time of seed generation and the generation of the values x and y, it would be much more difficult to brute force the solution, because most of the values of .Random.seed are not reachable either by the clock method or set.seed, they are only reachable by lots of calls to the underlying generator. So for each seed value you'd need to do every possible number of RNG generations before generating x and y. Duncan Murdoch Thanks. That is exactly what I was looking for. R-help is awesome. Jay __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.