Not sure if this is what people are running into, but if you use
variables, even lexicals scoped on the package level, in a subtype of
HTML::Parser, they won't get reset if you call new() on your class
unless you overload the default new() or otherwise reset them.

For example (untested, but this is approximately what I recall doing on
my own)

package Foo::Parser;
our @ISA=qw(HTML::Parser);
my $foo;
my $bar;

sub text {

package MyMain;

my $p=Foo::Parser->new;
# $foo and $bar are empty
# $foo and bar are now set
my $q=Foo::Parser->new;
# $foo and $bar are still set

I run into this a lot even outside of mod_perl if using the same parser
twice...  It might also apply to the blessed hashref too, I don't
recall.  In any case, I usually just add a sub reset() to my
HTML::Parser subclasses which resets all instance data and call that
every time I construct a parser before calling parse().

Again, not sure if this is what people have been running into, but
thought it might be worth mentioning.  Best of luck, people.


Mike Henderson wrote:

>I think it's pretty safe to say there is definitely some issues with
>HTML::Parser and mod_perl, at least when subclassing it.
> I managed to kludge around the problem by not doing that -- ie not doing:
>package PackageName;
>use HTML::Parser;
>@PackageName::ISA = qw(HTML::Parser);
>I ended up using a somewhat different approach, something like:
> ---
>package PackageName;
>use HTML::Parser;
>sub new {
>my $SELF_PackageName = bless {}, shift;
>$SELF_PackageName->{parser} = HTML::Parser->new( api_version => 3,
>start_h => [\&start, "self, tagname, attr, attrseq, text"],
>end_h => [\&end, "self, tagname, text" ],
>text_h => [\&text, "self, text, is_cdata"]
>return $SELF_PackageName;
>sub parse_file { shift->{parser}->parse_file(@_); }
>sub start { ... }
>sub end { ... }
>sub text { ... }
> It got a bit weird after that, as the HTML::Parser callbacks pass the
>instance of the actual HTML::Parser object back to the PackageName routines,
>and I actually end up storing all of
>my data in the HTML::Parser namespace ... but it works! :) ... and this is
>why we love perl.
> Thanks guys.
>>>Hello, just a quick question...
>>>Has anyone out there successfully deployed HTML::Parser in an apache
>>>1.3.x / mod_perl / HTML::Mason environment (dynamically parsing pages)
>>>I realize that the module itself is kind of crunky, and additionally
>>>an XS module, so, i'm left wondering.
>>>Basically, what i'm seeing is everything working as you'd expect on
>>>the first load of the page which creates and uses an HTML::Parser
>>>object, but, on any subsequent loads from that same apache child,
>>>things are partially broken -- specifically, during parsing, callbacks
>>>to text() don't seem to be happening, but callbacks to start() and
>>>end() seem to work fine.
>>>I'm wondering if there's any way around this -- that is, any way to
>>>completely destroy any previous data that HTML::Parser is letting
>>>linger that's causing a problem, and reloading the module. Not sure
>>>about the feasiblity of this due it being XS.
>>I have seen odd behavior using Netscape::Bookmarks (which uses
>>HTML::Parse to parse the file) under mod_perl 1.3.x and Mason. I
>>thought it was my code maybe, but what you are saying reminds me that
>>we got garbage back sometimes from a parse.
>>Barry Hoggard


