Sluggish code

venkates Mon, 11 Jun 2012 07:31:44 -0700

Hi all,

I am trying to filter files from a directory (code provided below) bycomparing the contents of each file with a hash ref (a parsed id mapfile provided as an argument). The code is working however, is extremelyslow. The .csv files (81 files) that I am reading are not very large(largest file is 183,258 bytes). I would appreciate if you couldsuggest improvements to the code.


sub filter {
    my ( $pazar_dir_path, $up_map, $output ) = @_;
    croak "Not enough arguments! " if ( @_ < 3 );

    my $accepted = 0;
    my $rejected = 0;

opendir DH, $pazar_dir_path or croak ("Error in opening directory'$pazar_dir_path': $!");open my $OUT, '>', $output or croak ("Cannot open file for writing'$output': $!");

    while ( my @data_files = grep(/\.csv$/,readdir(DH)) ) {
        my @records;
        foreach my $file ( @data_files ) {

open my $FH, '<', "$pazar_dir_path/$file" or croak ("Cannotopen file '$file': $!");

            while ( my $data = <$FH> ) {
                chomp $data;
                my $record_output;
                @records = split /\t/, $data;
                foreach my $up_acs ( keys %{$up_map} ) {

foreach my $ensemble_id (@{$up_map->{$up_acs}{'Ensembl_TRS'}} ){

                        if ( $records[1] eq $ensemble_id ) {
                            $record_output = join( "\t", @records );
                            print $OUT "$record_output\n";
                            $accepted++;
                        }
                        else {
                            $rejected++;
                            next;
                        }
                    }
                }
            }
            close $FH;
        }
    }
    close $OUT;
    closedir (DH);
    print "accepted records: $accepted\n, rejected records: $rejected\n";
    return $output;
}

__DATA__

TF0000210 ENSMUST00000001326 SP1_MOUSE GS0000422ENSMUSG00000037974 7 148974877 149005136 Mus musculusMUC5AC 14570593 ELECTROPHORETIC MOBILITY SHIFT ASSAY(EMSA)::SUPERSHIFTTF0000211 ENSMUST00000066003 SP3_MOUSE GS0000422ENSMUSG00000037974 7 148974877 149005136 Mus musculusMUC5AC 14570593 ELECTROPHORETIC MOBILITY SHIFT ASSAY(EMSA)::SUPERSHIFT



Thanks a lot,

Aravind

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

Sluggish code

Reply via email to