Hi,
I'm a *real* newbie when it comes to threading (and rather new to
programming in general) After reading the documentation for the
Thread.pm module and the threads.pm module, I believe I wrote something
that should work just fine...
However, if I'm using the Thread.pm module, the following code always
blocks while waiting for data from the queue in the worker threads,
regardless of what the main thread pushes onto the queue. It seems like
the threads each get their own copy of a queue? Is there any way to
ensure that there's only *one* copy of the queue?
Using the threads.pm module, I've tried using the threads::share to
share the queue only, but when pushing scalars (or refs) onto the shared
queue, I get the message that the scalar isn't shared, so I added the
Tree::TreeParser object as shared. Now the parsing operation fails
w/error "_hparser_xs_state element is not a reference at
<snip>/HTML/Parser.pm line 104"...
Anybody have any suggestions for what I'm doing wrong when trying to use
the queue? (or at the very least, have any good online docs I can look at?)
TIA,
// Thomas
<Included Perl snippet>
# parseFile() - Publicly callable function
# Used to retieve a file to parse from the datablase and "returns"
# a parsed HTML::Tree object.
#
sub parseFile {
my $self = shift(@_);
# Retrieve the database handle
# Get an array of hashes containing the data we need.
# We need the filename and it's status (scanned=y/n).
# We call sys_check::mysqlint->getData( <tablename>,
# <wherestatement>, <columnarray>)
# This is an array of records from a database
my @hashref = @{$self->{'dbh'}->getFromDB("file_list",
"scanned= \'no\'")};
# Only allow 2 HTML::TreeParser objects to be active at one time
my $maxqueuesize = 2;
my $queue = new Thread::Queue;
my $ready = new Thread::Semaphore;
share($queue);
share($ready);
# Create a pool of threads (one per file we've got to process)
foreach (1..@hashref) {
new threads \&parseSysCheck, $ready, $queue;
} # <end> while
# Loop through the records and place a parsed tree into the
# thread queue
foreach (@hashref) {
# Store HTML::TreeBuilder Object and filename in
my $htmlTree = new HTML::TreeBuilder;
share($htmlTree);
# Ensure that we don't suck the system down
yield while ($queue->pending >= $maxqueuesize);
# Attempt to parse the HTML file
warn "Unable to parse $$_{'filename'}"
unless ($htmlTree->parse_file($$_{'filename'}));
# Push HTML::TreeParser object and $filename
# onto variable queue
$queue->enqueue($htmlTree);
} # <end> for
# Wait for all of the threads to complete
my @threads = threads->list();
while (my $thread = shift @threads) {
DEBUG("Waiting for thread #".$thread->tid." to end");
my $success = $thread->join;
} # <end> while
} # <end> getFile()
# parseSysCheck($<filename>)
# Takes a string path to the sys_check file to parse
# returns a HTML::Tree object
#
sub parseSysCheck {
# Save the argument and thread reference
my ($ready, $queue) = @_;
my $thread = threads->self();
my $self = _get_class();
# Pick up a HTML::TreeParser object from the queue
my $html = $queue->dequeue;
if ($html) {
my @hrefs = $html->look_down('_tag', 'a');
if (@hrefs) {
DEBUG(Dumper(@hrefs));
} # <end> if
# Cleanup and signal that the threads are done
$html->delete();
} # <end> if
# At some point, return something since everything worked out..
# return(1);
} # <end> parseSysCheck
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]