[Boston.pm] XML Parsing and Named Entities...

Jayson DeLancey Sun, 02 Nov 2003 19:35:43 -0800

A few weeks ago I was happily parsing xml files that contained named
entities (&nbsp;, &rsquo;, &eacute;, etc.).  The xml files contain a dtd
reference, that on disk is a link to a dtd document which in turn
references an entities file.  This all worked well on a pretty standard
RH6 machine, Perl 5.005, XML::DOM 1.30, XML::Parser 2.30.


Now, following an upgrade to RH8, Perl 5.80, XML::DOM 1.43, XML::Parser
2.31, I am not having as much luck.  The entities are unrecognized and
now die a horrible death.

There are a number of scripts and programs based on these modules, so I'm
looking for something that doesn't require a complete rewrite.  My
suspicion is that the entities file never was found before, but the
default behavior was to use LWP or some other source for identifying these
common named entities.  Perhaps that's my biggest question, how did it
work before?  I can't isolate the difference between XML::Parser 2.30 and
2.31 to determine what has changed, or what it might be calling that has
changed.  One alternative I've found is to use ParseParamEnt as a
parameter that forces the dtd and entities files to be read, but this has
other side-effects such as loading all the definitions into the current
xml file.  Not what I have in mind plus it doesn't explain why it worked
before without the entities file.  Transforming the entities into other
notations is not an option either.

I feel like I've just read the wrong documents or am failing to see
something obvious.  What are others doing to parse their documents when
named html entities are involved?

Thanks for any help.

-Lance

_______________________________________________
Boston-pm mailing list
[EMAIL PROTECTED]
http://mail.pm.org/mailman/listinfo/boston-pm

[Boston.pm] XML Parsing and Named Entities...

Reply via email to