Well, I waited several minutes without a response, so I fixed it myself :>
There're several bugs/misfeatures/whatever in LibXML.xs (along with a
bug or two in my calling script...)
Warning: I'm only slightly familiar with the c binding of libxml2, and
much less with Perl's XS, but the following should be close to right.
This outlines my changes (I've attached diff -u LibXML.xs.orig LibXML.xs)
1) In parse_string, it decides whether to output validation error
messages depending on whether xmlDoValidityCheckingDefaultValue
is true, but AFTER LibXML_cleanup_parser has already reset it to off!
So, even if you have $parser->validation(1), you'll only see well-formed
errors, but not validation errors.
I simply saved the value to check against before the cleanup, but
it is interesting that parse_fh and parse_file decide to throw
errors if the document comes back NULL. This approach is certainly
simpler and probably works OK for XML (but not HTML! see below!)
2) In parse_string and parse_fh, you have a nice optional argument
directory. I'm not sure what putting the directory in the parsing
context does, but if it were the filename (URL, actually), you
can also place it as the real_dom->URL (instead of the random
string that you use). This has the benefit that processing
xincludes works as you would expect it to. At any rate, I go
ahead and stick directory in real_dom->URL, if it's given.
3) parse_html_(string|fh|file), libxml2 will validate the html if you've
got validation(1), However, it _still_ returns a dom!! Thus LibXML
thinks it succeeded and doesn't croak on the error messages.
I added a test to parse_html_string to check if XML_error is non-empty
and croak if so, but this seems clunky, so I didn't add it to the
other 2.
There ought to be a better way....
4) I was trying to get error messages out of is_valid, which I guess
is the
wrong thing -- it should just say Yes or No. However, there is an
apparently not-yet-documented method validate() which is intended to
do what I want. However, both is_valid and validate are only setting
the error and warning handlers if a DTD was given, which leads to
(occasional)
segfaults.
After fixing (3), another problem turned up, but I'm not so sure how to
fix it.
My stylesheet has <xsl:output ... encoding='ISO-8859-1' ...
which produces in the html:
<meta content="text/html; charset=ISO-8859-1" http-equiv="Content-Type">
which seems legit, and xmllint never complained. However, using
XML::LibXML,
I'm getting the error message:
Entity: line 4: error: xmlSwitchEncoding : no input
<meta content="text/html; charset=ISO-8859-1" http-equiv="Content-Type">
Presumably it has something to do with LibXML storing everything as UTF-8
internally, and it can't/won't convert to ISO ?? Any suggestions?
Thanks
Bruce Miller wrote:
> Hi Matt and everybody;
> This is really a question about XML::LibXML, rather than AxKit...
> [Is it OK to post it here?]
>
> I can get access to well-formedness error messages by
> my $parser=XML::LibXML->new();
> eval { $doc=$parser->parse_string($xml); };
> and examining $@.
>
> However, validation error messages _don't_ show up in $@
> after
> eval { $valid = $doc->is_valid(); };
> Or anywhere else that I can determine!, although
> the boolean returned seems to be correct.
>
> In the case of HTML,
> eval { $hdoc=$parser->parse_html_string($html);
> $valid=$hdoc->is_valid();
> };
> seems to always claim the doc is invalid (although I'm convinced
> otherwise :>) [and occasionally even seg]
>
> Are these capabilites known not to be (Yet ???) in XML::LibXML,
> or am I doing something wrong?
> [libxml2, via xmllint, can do them all, including validating html]
>
> I'm using libxml 2.4.11, XML::LibXML 1.31, on RH Linux 7.1
> Thanks!
> bruce
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
--- LibXML.xs.orig Wed Dec 5 11:11:27 2001
+++ LibXML.xs Wed Dec 5 15:28:58 2001
@@ -788,12 +788,14 @@
int ret;
xmlDocPtr real_dom;
ProxyObject * proxy;
+ int checkvalid;
CODE:
ptr = SvPV(string, len);
if (len == 0) {
croak("Empty string");
}
-
+/* Save value, since it gets cleared by LibXML_cleanup_parser. */
+ checkvalid=xmlDoValidityCheckingDefaultValue;
ctxt = xmlCreateMemoryParserCtxt(ptr, len);
if (ctxt == NULL) {
croak("Couldn't create memory parser context: %s", strerror(errno));
@@ -823,16 +825,19 @@
real_dom = ctxt->myDoc;
xmlFreeParserCtxt(ctxt);
sv_2mortal(LibXML_error);
- if (!well_formed || (xmlDoValidityCheckingDefaultValue && !valid)) {
+ if (!well_formed || (checkvalid && !valid)) {
xmlFreeDoc(real_dom);
RETVAL = &PL_sv_undef;
croak(SvPV(LibXML_error, len));
}
else {
STRLEN n_a;
- SV * newURI = newSVpvf("unknown-%12.12d", real_dom);
- real_dom->URL = xmlStrdup(SvPV(newURI, n_a));
- SvREFCNT_dec(newURI);
+ if(directory != NULL){
+ real_dom->URL = xmlStrdup(directory);
+ } else {
+ SV * newURI= newSVpvf("unknown-%12.12d", real_dom);
+ real_dom->URL = xmlStrdup(SvPV(newURI, n_a));
+ SvREFCNT_dec(newURI); }
proxy = make_proxy_node( (xmlNodePtr)real_dom );
RETVAL = sv_newmortal();
sv_setref_pv( RETVAL, (char *)CLASS, (void*)proxy );
@@ -869,9 +874,12 @@
}
else {
STRLEN n_a;
- SV * newURI = newSVpvf("unknown-%12.12d", real_dom);
- real_dom->URL = xmlStrdup(SvPV(newURI, n_a));
- SvREFCNT_dec(newURI);
+ if(directory != NULL){
+ real_dom->URL = xmlStrdup(directory);
+ } else {
+ SV * newURI= newSVpvf("unknown-%12.12d", real_dom);
+ real_dom->URL = xmlStrdup(SvPV(newURI, n_a));
+ SvREFCNT_dec(newURI); }
proxy = make_proxy_node( (xmlNodePtr)real_dom );
RETVAL = sv_newmortal();
@@ -963,7 +971,8 @@
sv_2mortal(LibXML_error);
- if (!real_dom) {
+ ptr=SvPV(LibXML_error,len);
+ if (!real_dom || (*ptr!='\0')) {
RETVAL = &PL_sv_undef;
croak(SvPV(LibXML_error, len));
}
@@ -1173,6 +1182,10 @@
SV * dtd_sv;
CODE:
LibXML_error = sv_2mortal(newSVpv("", 0));
+ cvp.userData = (void*)PerlIO_stderr();
+ cvp.error = (xmlValidityErrorFunc)LibXML_validity_error;
+ cvp.warning = (xmlValidityWarningFunc)LibXML_validity_warning;
+
if (items > 1) {
dtd_sv = ST(1);
if ( sv_isobject(dtd_sv) && (SvTYPE(SvRV(dtd_sv)) == SVt_PVMG) ) {
@@ -1184,14 +1197,12 @@
else {
croak("is_valid: argument must be a DTD object");
}
- cvp.userData = (void*)PerlIO_stderr();
- cvp.error = (xmlValidityErrorFunc)LibXML_validity_error;
- cvp.warning = (xmlValidityWarningFunc)LibXML_validity_warning;
RETVAL = xmlValidateDtd(&cvp, self, dtd);
}
else {
RETVAL = xmlValidateDocument(&cvp, self);
}
+
OUTPUT:
RETVAL
@@ -1205,7 +1216,10 @@
SV * dtd_sv;
STRLEN n_a;
CODE:
- LibXML_error = sv_2mortal(newSVpv("", 0));
+ LibXML_error = sv_2mortal(newSVpv("", 0));
+ cvp.userData = (void*)PerlIO_stderr();
+ cvp.error = (xmlValidityErrorFunc)LibXML_validity_error;
+ cvp.warning = (xmlValidityWarningFunc)LibXML_validity_warning;
if (items > 1) {
dtd_sv = ST(1);
if ( sv_isobject(dtd_sv) && (SvTYPE(SvRV(dtd_sv)) == SVt_PVMG) ) {
@@ -1217,14 +1231,12 @@
else {
croak("is_valid: argument must be a DTD object");
}
- cvp.userData = (void*)PerlIO_stderr();
- cvp.error = (xmlValidityErrorFunc)LibXML_validity_error;
- cvp.warning = (xmlValidityWarningFunc)LibXML_validity_warning;
RETVAL = xmlValidateDtd(&cvp, self , dtd);
}
else {
RETVAL = xmlValidateDocument(&cvp, self);
}
+
if (RETVAL == 0) {
croak(SvPV(LibXML_error, n_a));
}
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]