Hello all, I have a file of 3,210,008 CSV records. I need to take a random sample of this. I tried hacking something together a while ago, but it seemed to repeat 65,536 different records. When I need a 5mil sample, this creates a problem.
Here is my old code: I know the logic allows dups, but what would incur the limit? I think with 500,000 samples there wouldn't be a problem getting more than 65536 diff records, but that number is too ironic for me to deal with. Thanks. #!/usr/local/bin/perl -w open (FILE,"consumer.sample.sasdump.txt"); open (NEW,">consumer.new"); @data = <FILE>; for ( $jess == 1; $jess < 500000; $jess++ ) { $index = rand @data; print NEW $data[$index]; } close(FILE); close(NEW); -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]