Prof Ripley, You are absolutely correct, this code will not work at all - for starters, M isn't correctly initialized, etc. I edited my code in tinn-r and never ran it before posting... my apologies, I always seem to be too quick off the mark to reply - despite the 4*runif(1) suggestion in the posting guide...
I hadn't realized before what significant difference the replace=TRUE would make to the runtime... Now I can just use the sample() code you suggested and remove my runif() code altogether, as sample(t+1, 1, replace=TRUE) - 1 will work fine with t <- 2e9 which is considerably more that I need. Thanks again to both Prof Ripley and Duncan Murdoch, Sean O'Riordain affiliation <- NULL On 19/09/06, Prof Brian Ripley <[EMAIL PROTECTED]> wrote: > On Tue, 19 Sep 2006, Sean O'Riordain wrote: > > > Hi Duncan, > > > > Thanks for that. In the light of what you've suggested, I'm now using > > the following: > > > > # generate a random integer from 0 to t (inclusive) > > if (t < 10000000) { # to avoid memory problems... > > M <- sample(t, 1) > > } else { > > while (M > t) { > > M <- as.integer(urand(1,min=0, max=t+1-.Machine$double.eps)) > > } > > } > > sample(t, 1) is a sample from 1:t, not 0:t. > > You need > > sample(t+1, 1, replace=TRUE) - 1 > > which works in all cases up to INT_MAX-1, and beyond that you need to > worry about the resolution of the RNG (and to use floor not as.integer). > > There is no such thing as urand in base R .... > > > > > cheers and Thanks, > > Sean > > > > On 18/09/06, Duncan Murdoch <[EMAIL PROTECTED]> wrote: > >> On 9/18/2006 3:37 AM, Sean O'Riordain wrote: > >>> Good morning, > >>> > >>> I'm trying to concisely generate a single integer from 0 to n > >>> inclusive, where n might be of the order of hundreds of millions. > >>> This will however be used many times during the general procedure, so > >>> it must be "reasonably efficient" in both memory and time... (at some > >>> later stage in the development I hope to go vectorized) > >>> > >>> The examples I've found through searching RSiteSearch() relating to > >>> generating random integers say to use : sample(0:n, 1) > >>> However, when n is "large" this first generates a large sequence 0:n > >>> before taking a sample of one... this computer doesn't have the memory > >>> for that! > >> > >> You don't need to give the whole vector: just give n, and you'll get > >> draws from 1:n. The man page is clear on this. > >> > >> So what you want is sample(n+1, 1) - 1. (Use "replace=TRUE" if you want > >> a sample bigger than 1, or you'll get sampling without replacement.) > >>> > >>> When I look at the documentation for runif(n, min, max) it states that > >>> the generated numbers will be min <= x <= max. Note the "<= max"... > >> > >> Actually it says that's the range for the uniform density. It's silent > >> on the range of the output. But it's good defensive programming to > >> assume that it's possible to get the endpoints. > >> > >>> > >>> How do I generate an x such that the probability of being (the > >>> integer) max is the same as any other integer from min (an integer) to > >>> max-1 (an integer) inclusive... My attempt is: > >>> > >>> urand.int <- function(n,t) { > >>> as.integer(runif(n,min=0, max=t+1-.Machine$double.eps)) > >>> } > >>> # where I've included the parameter n to help testing... > >> > >> Because of rounding error, t+1-.Machine$double.eps might be exactly > >> equal to t+1. I'd suggest using a rejection method if you need to use > >> this approach: but sample() is better in the cases where as.integer() > >> will work. > >> > >> Duncan Murdoch > >>> > >>> is floor() "better" than as.integer? > >>> > >>> Is this correct? Is the probability of the integer t the same as the > >>> integer 1 or 0 etc... I have done some rudimentary testing and this > >>> appears to work, but power being what it is, I can't see how to > >>> realistically test this hypothesis. > >>> > >>> Or is there a a better way of doing this? > >>> > >>> I'm trying to implement an algorithm which samples into an array, > >>> hence the need for an integer - and yes I know about sample() thanks! > >>> :-) > >>> > >>> { incidentally, I was surprised to note that the maximum value > >>> returned by summary(integer_vector) is "pretty" and appears to be > >>> rounded up to a "nice round number", and is not necessarily the same > >>> as max(integer_vector) where the value is large, i.e. of the order of > >>> say 50 million } > >>> > >>> Is version etc relevant? (I'll want to be portable) > >>>> version _ > >>> platform i386-pc-mingw32 > >>> arch i386 > >>> os mingw32 > >>> system i386, mingw32 > >>> status > >>> major 2 > >>> minor 3.1 > >>> year 2006 > >>> month 06 > >>> day 01 > >>> svn rev 38247 > >>> language R > >>> version.string Version 2.3.1 (2006-06-01) > >>> > >>> Many thanks in advance for your help. > >>> Sean O'Riordain > >>> affiliation <- NULL > >>> > >>> ______________________________________________ > >>> R-help@stat.math.ethz.ch mailing list > >>> https://stat.ethz.ch/mailman/listinfo/r-help > >>> PLEASE do read the posting guide > >>> http://www.R-project.org/posting-guide.html > >>> and provide commented, minimal, self-contained, reproducible code. > >> > >> > > > > ______________________________________________ > > R-help@stat.math.ethz.ch mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > -- > Brian D. Ripley, [EMAIL PROTECTED] > Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ > University of Oxford, Tel: +44 1865 272861 (self) > 1 South Parks Road, +44 1865 272866 (PA) > Oxford OX1 3TG, UK Fax: +44 1865 272595 > ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.