Hi all,

I am trying to filter files from a directory (code provided below) by comparing the contents of each file with a hash ref (a parsed id map file provided as an argument). The code is working however, is extremely slow. The .csv files (81 files) that I am reading are not very large (largest file is 183,258 bytes). I would appreciate if you could suggest improvements to the code.

sub filter {
    my ( $pazar_dir_path, $up_map, $output ) = @_;
    croak "Not enough arguments! " if ( @_ < 3 );

    my $accepted = 0;
    my $rejected = 0;

opendir DH, $pazar_dir_path or croak ("Error in opening directory '$pazar_dir_path': $!"); open my $OUT, '>', $output or croak ("Cannot open file for writing '$output': $!");
    while ( my @data_files = grep(/\.csv$/,readdir(DH)) ) {
        my @records;
        foreach my $file ( @data_files ) {
open my $FH, '<', "$pazar_dir_path/$file" or croak ("Cannot open file '$file': $!");
            while ( my $data = <$FH> ) {
                chomp $data;
                my $record_output;
                @records = split /\t/, $data;
                foreach my $up_acs ( keys %{$up_map} ) {
foreach my $ensemble_id ( @{$up_map->{$up_acs}{'Ensembl_TRS'}} ){
                        if ( $records[1] eq $ensemble_id ) {
                            $record_output = join( "\t", @records );
                            print $OUT "$record_output\n";
                            $accepted++;
                        }
                        else {
                            $rejected++;
                            next;
                        }
                    }
                }
            }
            close $FH;
        }
    }
    close $OUT;
    closedir (DH);
    print "accepted records: $accepted\n, rejected records: $rejected\n";
    return $output;
}

__DATA__

TF0000210 ENSMUST00000001326 SP1_MOUSE GS0000422 ENSMUSG00000037974 7 148974877 149005136 Mus musculus MUC5AC 14570593 ELECTROPHORETIC MOBILITY SHIFT ASSAY (EMSA)::SUPERSHIFT TF0000211 ENSMUST00000066003 SP3_MOUSE GS0000422 ENSMUSG00000037974 7 148974877 149005136 Mus musculus MUC5AC 14570593 ELECTROPHORETIC MOBILITY SHIFT ASSAY (EMSA)::SUPERSHIFT


Thanks a lot,

Aravind

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/


Reply via email to