Cheez wrote:
Howdy,
Hello,
scripting with perl is a hobby and not a vocation so i apologize in advance for rough looking code. I have a very large list of 16-letter words called "hashsequence16.txt". This file is 203MB in size. I have a large list of data called "newrawdata.txt". This file is 95MB. For each 16-letter word, I am looping through "newrawdata.txt" to 1) find a match and 2) take the the full line of rawdata.txt and associate that with the 16-letter word. Using a filesize line-counter and timing how long it takes to process my data lets me know that I have 9534 hours to see if I can find an alternative solution. It's pretty brute force but I don't know if there is another way to do it. Any comments or guidance would be greatly appreciated. Thanks, Dan ==========================================
use warnings; use strict;
print "**fisher**"; $flatfile = "newrawdata.txt";
my $flatfile = 'newrawdata.txt';
# 95MB in size $datafile = "hashsequence16.txt";
my $datafile = 'hashsequence16.txt';
# 203MB in size
my $seqparsed = 'fishersearch.txt';
my $filesize = -s "hashsequence16.txt";
You already have the string "hashsequence16.txt" stored in the variable $datafile so why not use that instead:
my $filesize = -s $datafile;
# for use in processing time calculation open(FILE, "$flatfile") || die "Can't open '$flatfile': $!\n"; open(FILE2, "$datafile") || die "Can't open '$flatfile': $!\n";
perldoc -q "What.s wrong with always quoting ..vars." Your error message for $datafile says it couldn't open $flatfile!
open (SEQFILE, ">fishersearch.txt") || die "Can't open '$seqparsed': $! \n";
Modern Perl idiom is to use a lexical filehandle, three argument open and the lower precedence 'or' operator:
open my $FILE, '<', $flatfile or die "Can't open '$flatfile': $!\n"; open my $FILE2, '<', $datafile or die "Can't open '$datafile': $!\n"; open my $SEQFILE, '>', $seqparsed or die "Can't open '$seqparsed': $!\n";
@preparse = <FILE>;
Since you are going to be removing the newlines anyway: chomp( my @preparse = <FILE> );
@hashdata = <FILE2>;
It looks like you don't really need to store this whole file in memory.
close(FILE); close(FILE2); for my $list1 (@hashdata) {
You could probably just read through this file normally: while ( my $list1 = <FILE2> ) {
# iterating through hash16 data $finish++;
And if you use a while loop you can use $. to get the current line number.
if ($finish ==10 ) { # line counter $marker = $marker + $finish; $finish =0; $left = $filesize - $marker;
$marker is based on the line number and $filesize is based on the number of bytes in the file so this calculation makes no sense. Perhaps you want this instead:
# outside the loop declare $left my $left = $filesize; # then here in the loop $left -= length $list1;
printf "$left\/$filesize\n";
printf() treats its first argument as a format string so that should be either:
printf "%s/%s\n", $left, $filesize; Or just: print "$left/$filesize\n";
# this prints every 17 seconds } ($line, $freq) = split(/\t/, $list1);
You never use $freq anywhere so just: my ( $line ) = split /\t/, $list1;
for my $rawdata (@preparse) { # iterating through rawdata $rawdata=~ s/\n//; if ($rawdata =~ m/$line/) { # matching hash16 word with rawdata line my $first_pos = index $rawdata,$line;
You could combine the last two statements: if ( ( my $first_pos = index $rawdata, $line ) >= 0 ) {
print SEQFILE "$first_pos\t$rawdata\n"; # printing to info to new file } } print SEQFILE "PROCESS\t$line\n"; # printing hash16 word and "process" }
John -- Perl isn't a toolbox, but a small machine shop where you can special-order certain sorts of tools at low cost and in short order. -- Larry Wall -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/