On 9/26/07, Pedro Soto <[EMAIL PROTECTED]> wrote:

> I need to derive a subsample with replacement from a large distribution of
> data. Say if my large sample is 10000 I need to get 100 data out of the
> 10000 data and repeat the procedure n times(that's what I called
> boostrapping).

Perl can easily select 100 items at random from 10000, as many times
as you need.

> I am using the function of perl srand to generate random
> numbers in order to do the resampling at  'random'.

It's rare to need to use srand(). You probably want just plain rand().

> The problem is that the
> distribution of the original data (10000) does not follow a gaussian
> distribution and therefore I am not sure if using only this function
> (srand) in perl would be enough, because the numbers of the large
> distribution won't have the same probability of being selected.

The probability of an item being selected by rand() shouldn't normally
depend upon the item itself. This code pulls 100 samples at random
from a list (@source) of at least that many items, but the items
themselves don't have any influence on the selection.

  my $samples_needed = 100;
  die "Not enough data" if @source < $samples_needed;
  my $count = 0;
  my @samples;  # starts off empty
  foreach my $item (@source) {
    next if $samples_needed / (++$count) <= rand;
    if (@samples < $samples_needed) {
      push @samples, $item;
    } else {
      $samples[rand @samples] = $item;  # random index
    }
  }

Does this get you any closer to a solution? Good luck with it!

--Tom Phoenix
Stonehenge Perl Training

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/


Reply via email to