Re: [R] can you tell what .Random.seed *was*?

2009-05-15 Thread Warren Young

Duncan Murdoch wrote:

1) can you tell me what my original set.seed() value was?  (I wouldn't
be able to figure it out, but maybe someone can)


The only way I know is to test all 2^32 possible values of the seed.  I 
think cryptographers would know faster ways.


Well, I'm not a cryptographer, but I know a faster way: rainbow tables.

http://en.wikipedia.org/wiki/Rainbow_table

Given that the algorithm to generate these images is known and that each 
seed always gives the same image as output, you can simply precompute 
all possible images, hash them using your favorite algorithm -- say, 
SHA-256 -- and record the seed-to-hash correspondence on disk.  Then 
given an output image, you can hash it and use that to look up the seed.


It can take a long time to generate all the images, but then you have 
database like lookup speeds for image-to-seed correspondence.


This is not just a theoretical idea.  There are underground sites where 
you can put in, say, an MD5 password hash and get out the likely 
password that was actually used.  This allows a black hat to break into 
one site, grab their password hash database, reverse engineer the 
passwords, and then go use them to bang on the front door of other sites 
users of that site he first compromised also use.  There are defenses 
against this: salting the passwords and using passwords too big to 
appear in rainbow tables are easiest.


Now, if the seed was removed just before the values were generated, the 
seed would be generated from the system clock.  If you knew the time 
that this occurred approximately, the search could be a lot faster.


This also helps with the rainbow table approach.

Given that the seed for the generation algorithm is always the current 
wall time, you can restrict the needed rainbow table size greatly.  You 
simply have to know when the algorithm was first put into use, then 
start your rainbow table with that time's value as the first seed, and 
only compute up to now plus whatever you need for future operations.


For instance, you can cover about 3 years worth of image production in 
about 1/45 the time as it takes to cover all 2^32 possible images.  Say 
it takes a month to generate a rainbow table covering those 3 years. 
Full coverage would then take nearly 4 years.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] can you tell what .Random.seed *was*?

2009-05-15 Thread Stavros Macrakis
On Thu, May 14, 2009 at 3:36 PM, G. Jay Kerns gke...@ysu.edu wrote:
 set.seed(something)
 x - rnorm(100)
 y - runif(500)
 # bunch of other stuff
...
 Now, I give you a copy of my script.R (with the set.seed statement
 removed, of course) together with the .RData file that was generated
 by the save.image() command.
...
 1) can you tell me what my original set.seed() value was?...
 2) is it possible *in principle* to figure out what set.seed was,
 given the above?

Set.seed takes an integer argument, that is, 2^32-1 distinct values
(cf NA_integer_), so the very simplest approach, brute-force search,
has a hope of working:

whatseed - function (v)  {
   i - as.integer(-2^31+1); max - as.integer(2^31-1)
   while (imax) { set.seed(i); if (runif(1)==v) return(i); i-i+1 }
}

 (OK, being able to figure it out in 2*10^68 years
 doesn't count, but within a couple months is acceptable.)

set.seed(-2^31+10)
system.time(whatseed(runif(1)))
   user  system elapsed
   1.530.001.53

2^32*(1.53/10)/3600
= 18.25
18 hours

 3) does the answer change if there is a
 remove(.Random.seed)
 command right before the save.image() command?

Depending on which RNG algorithm (RNGkind) you use, there may be
cryptographic techniques that are more efficient than brute-force
search, especially if the full internal state (.Random.seed) is
preserved.

This all assumes that the seed is set *only* with set.seed.  If
.Random.seed is modified directly, there are many more possibilities
for most of the RNGs.

 -s

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] can you tell what .Random.seed *was*?

2009-05-15 Thread G. Jay Kerns

 Set.seed takes an integer argument, that is, 2^32-1 distinct values
 (cf NA_integer_), so the very simplest approach, brute-force search,
 has a hope of working:

 whatseed - function (v)  {
   i - as.integer(-2^31+1); max - as.integer(2^31-1)
   while (imax) { set.seed(i); if (runif(1)==v) return(i); i-i+1 }
 }

 (OK, being able to figure it out in 2*10^68 years
 doesn't count, but within a couple months is acceptable.)

 set.seed(-2^31+10)
 system.time(whatseed(runif(1)))
   user  system elapsed
   1.53    0.00    1.53

 2^32*(1.53/10)/3600
    = 18.25
 18 hours

 3) does the answer change if there is a
 remove(.Random.seed)
 command right before the save.image() command?

 Depending on which RNG algorithm (RNGkind) you use, there may be
 cryptographic techniques that are more efficient than brute-force
 search, especially if the full internal state (.Random.seed) is
 preserved.

 This all assumes that the seed is set *only* with set.seed.  If
 .Random.seed is modified directly, there are many more possibilities
 for most of the RNGs.

             -s




Thanks very much to Warren and Stavros for their additional insight.
Putting all of this together, I think I am now ready to formulate my
question intelligently:

Using Sweave, I want to distribute randomly generated problems AND
answers to both teacher AND student.

More precisely, I want to distribute:
1) the .Rnw file
2) the .RData file saved near the end of the Sweave process.

I want it to be *easy* for the Instructor to change my seed and
generate new problems.

I want it to be *difficult* for students to figure out the seed and
automatically generate solutions on their own.


Of course, difficult is a relative term, since what is difficult
for them may well be easy for me, and what is difficult for me will
be trivial to cryptographers and some people on this list.  The
audience would be, say, upper division undergraduate students at a
public university.


What is clear so far: a brute force search of set.seed() is really
pretty easy and fast... even for students at this level.

However, relating to Duncan's second remark:  what if the Instructor
inserted an *unknown* very large number of calls to the RNG near the
beginning of the .Rnw (but after the set.seed)...  and did not
distribute this information to the students...  that would make it
much harder, yes?

Any ideas that are even better than this?

Conceivably, some of my students will be searching these archives in
the future;  please feel free to respond off-list if appropriate.

Jay


















-- 

***
G. Jay Kerns, Ph.D.
Associate Professor
Department of Mathematics  Statistics
Youngstown State University
Youngstown, OH 44555-0002 USA
Office: 1035 Cushwa Hall
Phone: (330) 941-3310 Office (voice mail)
-3302 Department
-3170 FAX
E-mail: gke...@ysu.edu
http://www.cc.ysu.edu/~gjkerns/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] can you tell what .Random.seed *was*?

2009-05-15 Thread Warren Young

G. Jay Kerns wrote:


I want it to be *difficult* for students to figure out the seed and
automatically generate solutions on their own.


Hmmm Would it really be a bad thing if someone reverse engineered 
this to generate answers given the problem set?  If it's hard enough to 
do that, it'd be more worth solving than the given problem set.  I call 
that extra credit.



a brute force search of set.seed() is really
pretty easy and fast... even for students at this level.


Either you're misunderstanding Stavros' benchmark results, or I am. 
Could easily be the latter...I'm an R newbie.


As far as I can tell, the inner part of the loop does very little.  If 
that's right, Stavros is saying it will take 18 hours to try every 
possible seed when the algorithm based on that seed takes almost no time 
to run.  But, if generating each problem set takes, say, a minute, it 
will take 4.7 million years to generate a complete rainbow table when 
there are 2^32 possible seeds.



what if the Instructor
inserted an *unknown* very large number of calls to the RNG near the
beginning of the .Rnw (but after the set.seed)...  and did not
distribute this information to the students...  that would make it
much harder, yes?


There are better ways.

As above, one key to making rainbow tables impractical is making the 
per-iteration time long enough.  Even if it only takes a second to 
generate each possible problem set, that's enough when multiplied by 
high enough powers of 2.


The other key is using big enough powers of 2.

I hadn't looked into R's random number generation before, but it appears 
quite robust.  Seeding it with the current wall clock time (a 32-bit 
integer on most systems) is an insult to its capability.


The default pseudo-random number generator (PRNG) in my copy of R is the 
Mersenne Twister, a truly awesome algorithm.  It's capable of very high 
quality results, as long as you give it a good seed.  It will take a 
vector of *many* integers as a seed, not just one.  It's not clear to me 
from the R docs if you can pass an arbitrary array of integers with any 
value, or if it needs something special.


Assuming you can give it any old passel of randomness as a seed, you 
just have to find a good source of randomness to create that seed.  On a 
Linux box, you could concatenate several dozen bytes read from 
/dev/random, the current wall clock time in microseconds, the inode of 
the R script being run, the process ID of the R interpreter, and the 
current mouse cursor position into a single string.  Feed all that into 
a hash algorithm, and break off pieces of that 4 bytes long, cast them 
to integers, and send that array of ints to set.seed().


If you use SHA-256 as the hash algorithm, that scheme should give you 
enough input randomness to get any of the possible 2^256 hash outputs, 
making that the amount of possible problem sets.  That's more than a 
rainbow table buster...there aren't enough atoms in the visible universe 
to construct a computer big enough to cope with 2^256 possible outputs.


That said, the quality of the PRNG just *allows* you to avoid screwing 
up.  It doesn't make it impossible make a weak algorithm.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] can you tell what .Random.seed *was*?

2009-05-15 Thread Stavros Macrakis
On Fri, May 15, 2009 at 12:07 PM, Stavros Macrakis
macra...@alum.mit.edu wrote:
 system.time(whatseed(runif(1)))

Sorry, though I got lucky and my overall result is roughly correct,
this is an incorrect time measure.  It should be

r - runif(1); system.time(whatseed(r))

because R's call-by-need semantics don't evaluate the runif before it
starts running whatseed.  The correct time (on my machine) is then 28
hours, not 18.

Better to avoid side-effect functions as arguments

 -s

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] can you tell what .Random.seed *was*?

2009-05-15 Thread Dirk Eddelbuettel

On 15 May 2009 at 13:08, G. Jay Kerns wrote:
| Thanks very much to Warren and Stavros for their additional insight.
| Putting all of this together, I think I am now ready to formulate my
| question intelligently:
| 
| Using Sweave, I want to distribute randomly generated problems AND
| answers to both teacher AND student.
| 
| More precisely, I want to distribute:
| 1) the .Rnw file
| 2) the .RData file saved near the end of the Sweave process.
| 
| I want it to be *easy* for the Instructor to change my seed and
| generate new problems.
| 
| I want it to be *difficult* for students to figure out the seed and
| automatically generate solutions on their own.
|
| Of course, difficult is a relative term, since what is difficult
| for them may well be easy for me, and what is difficult for me will
| be trivial to cryptographers and some people on this list.  The
| audience would be, say, upper division undergraduate students at a
| public university.
| 
| 
| What is clear so far: a brute force search of set.seed() is really
| pretty easy and fast... even for students at this level.
| 
| However, relating to Duncan's second remark:  what if the Instructor
| inserted an *unknown* very large number of calls to the RNG near the
| beginning of the .Rnw (but after the set.seed)...  and did not
| distribute this information to the students...  that would make it
| much harder, yes?
| 
| Any ideas that are even better than this?

You could use (one or more) seeds from a hardware RNGs.

The website http://random.org by Mads Haahr distributes such numbers (and my
CRAN package 'random' gets them for you in a convenient fashion).  Have a
look at the docs at random.org, and the two vignettes in the random package:

   RANDOM.ORG offers true random numbers to anyone on the Internet. The
   randomness comes from atmospheric noise, which for many purposes is better
   than the pseudo-random number algorithms typically used in computer
   programs. People use RANDOM.ORG for holding drawings, lotteries and
   sweepstakes, to drive games and gambling sites, for scientific
   applications and for art and music.  

Hth, Dirk

-- 
Three out of two people have difficulties with fractions.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] can you tell what .Random.seed *was*?

2009-05-14 Thread G. Jay Kerns
Dear R-help,

Suppose I write a script that looks something like this:

 script.R

set.seed(something)

x - rnorm(100)
y - runif(500)
# bunch of other stuff

save.image()

### end of script.R


Now, I give you a copy of my script.R (with the set.seed statement
removed, of course) together with the .RData file that was generated
by the save.image() command.



Question:
1) can you tell me what my original set.seed() value was?  (I wouldn't
be able to figure it out, but maybe someone can)

2) is it possible *in principle* to figure out what set.seed was,
given the above?  (OK, being able to figure it out in 2*10^68 years
doesn't count, but within a couple months is acceptable.)

3) does the answer change if there is a

remove(.Random.seed)

command right before the save.image() command?


Any thoughts would be appreciated.

Cheers,
Jay







***
G. Jay Kerns, Ph.D.
Associate Professor
Department of Mathematics  Statistics
Youngstown State University
Youngstown, OH 44555-0002 USA
Office: 1035 Cushwa Hall
Phone: (330) 941-3310 Office (voice mail)
-3302 Department
-3170 FAX
E-mail: gke...@ysu.edu
http://www.cc.ysu.edu/~gjkerns/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] can you tell what .Random.seed *was*?

2009-05-14 Thread Duncan Murdoch

On 5/14/2009 3:36 PM, G. Jay Kerns wrote:

Dear R-help,

Suppose I write a script that looks something like this:

 script.R

set.seed(something)

x - rnorm(100)
y - runif(500)
# bunch of other stuff

save.image()

### end of script.R


Now, I give you a copy of my script.R (with the set.seed statement
removed, of course) together with the .RData file that was generated
by the save.image() command.



Question:
1) can you tell me what my original set.seed() value was?  (I wouldn't
be able to figure it out, but maybe someone can)


The only way I know is to test all 2^32 possible values of the seed.  I 
think cryptographers would know faster ways.


2) is it possible *in principle* to figure out what set.seed was,
given the above?  (OK, being able to figure it out in 2*10^68 years
doesn't count, but within a couple months is acceptable.)


I think the brute force search would take about 4-5 days on my old 
computer, so a fast one could do it in a few hours.




3) does the answer change if there is a

remove(.Random.seed)

command right before the save.image() command?


Not with the brute force search.  The cryptographers would probably be 
able to make use of the random seed pretty well.


Now, if the seed was removed just before the values were generated, the 
seed would be generated from the system clock.  If you knew the time 
that this occurred approximately, the search could be a lot faster.


On the other hand, if there was an unknown large amount of work between 
the time of seed generation and the generation of the values x and y, it 
would be much more difficult to brute force the solution, because most 
of the values of .Random.seed are not reachable either by the clock 
method or set.seed, they are only reachable by lots of calls to the 
underlying generator.  So for each seed value you'd need to do every 
possible number of RNG generations before generating x and y.


Duncan Murdoch






Any thoughts would be appreciated.

Cheers,
Jay







***
G. Jay Kerns, Ph.D.
Associate Professor
Department of Mathematics  Statistics
Youngstown State University
Youngstown, OH 44555-0002 USA
Office: 1035 Cushwa Hall
Phone: (330) 941-3310 Office (voice mail)
-3302 Department
-3170 FAX
E-mail: gke...@ysu.edu
http://www.cc.ysu.edu/~gjkerns/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] can you tell what .Random.seed *was*?

2009-05-14 Thread G. Jay Kerns
Dear Duncan,

On Thu, May 14, 2009 at 4:39 PM, Duncan Murdoch murd...@stats.uwo.ca wrote:
 On 5/14/2009 3:36 PM, G. Jay Kerns wrote:

 Question:
 1) can you tell me what my original set.seed() value was?  (I wouldn't
 be able to figure it out, but maybe someone can)

 The only way I know is to test all 2^32 possible values of the seed.  I
 think cryptographers would know faster ways.

 2) is it possible *in principle* to figure out what set.seed was,
 given the above?  (OK, being able to figure it out in 2*10^68 years
 doesn't count, but within a couple months is acceptable.)

 I think the brute force search would take about 4-5 days on my old computer,
 so a fast one could do it in a few hours.


 3) does the answer change if there is a

 remove(.Random.seed)

 command right before the save.image() command?

 Not with the brute force search.  The cryptographers would probably be able
 to make use of the random seed pretty well.

 Now, if the seed was removed just before the values were generated, the seed
 would be generated from the system clock.  If you knew the time that this
 occurred approximately, the search could be a lot faster.

 On the other hand, if there was an unknown large amount of work between the
 time of seed generation and the generation of the values x and y, it would
 be much more difficult to brute force the solution, because most of the
 values of .Random.seed are not reachable either by the clock method or
 set.seed, they are only reachable by lots of calls to the underlying
 generator.  So for each seed value you'd need to do every possible number of
 RNG generations before generating x and y.

 Duncan Murdoch


Thanks.  That is exactly what I was looking for.

R-help is awesome.
Jay

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.