G. Jay Kerns wrote:

I want it to be *difficult* for students to figure out the seed and
automatically generate solutions on their own.

Hmmm.... Would it really be a bad thing if someone reverse engineered this to generate answers given the problem set? If it's hard enough to do that, it'd be more worth solving than the given problem set. I call that "extra credit".

a brute force search of set.seed() is really
pretty easy and fast... even for students at this level.

Either you're misunderstanding Stavros' benchmark results, or I am. Could easily be the latter...I'm an R newbie.

As far as I can tell, the inner part of the loop does very little. If that's right, Stavros is saying it will take 18 hours to try every possible seed when the algorithm based on that seed takes almost no time to run. But, if generating each problem set takes, say, a minute, it will take 4.7 million years to generate a complete rainbow table when there are 2^32 possible seeds.

what if the Instructor
inserted an *unknown* very large number of calls to the RNG near the
beginning of the .Rnw (but after the set.seed)...  and did not
distribute this information to the students...  that would make it
much harder, yes?

There are better ways.

As above, one key to making rainbow tables impractical is making the per-iteration time long enough. Even if it only takes a second to generate each possible problem set, that's enough when multiplied by high enough powers of 2.

The other key is using big enough powers of 2.

I hadn't looked into R's random number generation before, but it appears quite robust. Seeding it with the current wall clock time (a 32-bit integer on most systems) is an insult to its capability.

The default pseudo-random number generator (PRNG) in my copy of R is the Mersenne Twister, a truly awesome algorithm. It's capable of very high quality results, as long as you give it a good seed. It will take a vector of *many* integers as a seed, not just one. It's not clear to me from the R docs if you can pass an arbitrary array of integers with any value, or if it needs something special.

Assuming you can give it any old passel of randomness as a seed, you just have to find a good source of randomness to create that seed. On a Linux box, you could concatenate several dozen bytes read from /dev/random, the current wall clock time in microseconds, the inode of the R script being run, the process ID of the R interpreter, and the current mouse cursor position into a single string. Feed all that into a hash algorithm, and break off pieces of that 4 bytes long, cast them to integers, and send that array of ints to set.seed().

If you use SHA-256 as the hash algorithm, that scheme should give you enough input randomness to get any of the possible 2^256 hash outputs, making that the amount of possible problem sets. That's more than a rainbow table buster...there aren't enough atoms in the visible universe to construct a computer big enough to cope with 2^256 possible outputs.

That said, the quality of the PRNG just *allows* you to avoid screwing up. It doesn't make it impossible make a weak algorithm.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to