Unless I’m entirely wrong it appears that you want to use shared threaded memory. This would allow you to keep out of apache altogether. Here is an example of using threads that I worked out using shared memory. We took a 4 hour task, serial, and turned it into 5 minutes with threads. This worked extremely well for me. I wasn’t using large hashes, but I was hundreds of files, per thread, with 30k lines per file. #!/usr/bin/env perl -w use strict; use warnings; use Data::Dumper; $Data::Dumper::Indent = 1; $Data::Dumper::Sortkeys = 1; $Data::Dumper::Deepcopy = 1; use threads; use threads::shared; use constant MAX_TRIES => 5;
sub sub_threads($$$); my $switch = undef; my $hash = undef; my $gsx = undef; my $cnt = 5; my %switches = ( 'A' => { 'b' => undef , 'c' => undef, 'd' => undef }, 'B' => { 'b' => undef , 'c' => undef, 'd' => undef }, 'C' => { 'b' => undef , 'c' => undef, 'd' => undef }, 'D' => { 'b' => undef , 'c' => undef, 'd' => undef }, 'E' => { 'b' => undef , 'c' => undef, 'd' => undef }, ); my %threads : shared = (); ###### ## create the threads ##### while (($switch,$hash) = each %switches) { unless (exists $threads{$switch}) { my %h : shared; $threads{$switch} = \%h; } while (($gsx, $_) = each %$hash) { unless (exists $threads{$switch}{$gsx}) { my %h : shared; $threads{$switch}{$gsx} = \%h; } unless (exists $threads{$switch}{$gsx}{'messages'}) { my @h : shared; $threads{$switch}->{$gsx}->{'messages'} = \@h; } $hash->{$gsx}->{'thread'} = threads->create(\&sub_threads,\$switch,\$gsx,\$cnt); $hash->{$gsx}->{'tries'} = 1; $cnt += 5; } } #print Dumper \%threads; #print Dumper \%switches; ###### ## endless loop to run while threads are still running ###### $cnt = 1; while ($cnt) { $cnt = 0; while (($switch,$hash) = each %switches) { while (($gsx, $_) = each %$hash) { if ($hash->{$gsx}->{'thread'}->is_running()) { $cnt = 1; # print "$switch->$gsx is running\n"; } else { # print "$switch->$gsx is NOT running\n"; # print " Reason for failure : \n"; # print ' ' . join('\n' , @{$threads{$switch}->{$gsx}->{'messages'}}) . "\n"; if ($hash->{$gsx}->{'tries'} < MAX_TRIES) { # print " max tries not reached for $switch->$gsx, will be trying again!\n"; $hash->{$gsx}->{'tries'}++; $hash->{$gsx}->{'thread'} = threads->create(\&sub_threads,\$switch,\$gsx,\$cnt); } else { print "send email! $switch->$gsx\n"; } } } sleep 2; } } #print Dumper \%threads; #print Dumper \%switches; sub sub_threads($$$) { my $ptr_switch = shift; my $ptr_gsx = shift; my $ptr_tNum = shift; sleep $$ptr_tNum; { lock(%threads); push @{$threads{$$ptr_switch}->{$$ptr_gsx}->{'messages'}} , "Leaving thread $$ptr_switch->$$ptr_gsx"; #$threads{$$ptr_switch}->{$ptr_gsx}->{'messages'} = "Leaving thread $$ptr_switch->$$ptr_gsx"; # locke freed at end oc scope } return 0; } On Feb 2, 2015, at 10:11 PM, Alan Raetz <alanra...@gmail.com<mailto:alanra...@gmail.com>> wrote: So I have a perl application that upon startup loads about ten perl hashes (some of them complex) from files. This takes up a few GB of memory and about 5 minutes. It then iterates through some cases and reads from (never writes) these perl hashes. To process all our cases, it takes about 3 hours (millions of cases). We would like to speed up this process. I am thinking this is an ideal application of mod_perl because it would allow multiple processes but share memory. The scheme would be to load the hashes on apache startup and have a master program send requests with each case and apache children will use the shared hashes. I just want to verify some of the details about variable sharing. Would the following setup work (oversimplified, but you get the idea…): In a file Data.pm, which I would use() in my Apache startup.pl<http://startup.pl/>, I would load the perl hashes and have hash references that would be retrieved with class methods: package Data; my %big_hash; open(FILE,"file.txt"); while ( <FILE> ) { … code …. $big_hash{ $key } = $value; } sub get_big_hashref { return \%big_hash; } <snip> And so in the apache request handler, the code would be something like: use Data.pm; my $hashref = Data::get_big_hashref(); …. code to access $hashref data with request parameters….. <snip> The idea is the HTTP request/response will contain the relevant input/output for each case… and the master client program will collect these and concatentate the final output from all the requests. So any issues/suggestions with this approach? I am facing a non-trivial task of refactoring the existing code to work in this framework, so just wanted to get some feedback before I invest more time into this... I am planning on using mod_perl 2.07 on a linux machine. Thanks in advance, Alan