Well, I waited several minutes without a response, so I fixed it myself :>
There're several bugs/misfeatures/whatever in LibXML.xs (along with a
bug or two in my calling script...)
Warning: I'm only slightly familiar with the c binding of libxml2, and
much less with Perl's XS, but the following should be close to right.

This outlines my changes (I've attached diff -u LibXML.xs.orig LibXML.xs)
  1) In parse_string, it decides whether to output validation error
    messages depending on whether xmlDoValidityCheckingDefaultValue
   is true, but AFTER LibXML_cleanup_parser has already reset it to off!
   So, even if you have $parser->validation(1), you'll only see well-formed
   errors, but not validation errors.
   I simply saved the value to check against before the cleanup, but
   it is interesting that parse_fh and parse_file decide to throw
   errors if the document comes back NULL.  This approach is certainly
   simpler and probably works OK for XML (but not HTML! see below!)

  2) In parse_string and parse_fh, you have a nice optional argument
    directory.  I'm not sure what putting the directory in the parsing
    context does, but if it were the filename (URL, actually), you
    can also place it as the real_dom->URL (instead of the random
    string that you use).  This has the benefit that processing
    xincludes works as you would expect it to.  At any rate, I go
    ahead and stick directory in real_dom->URL, if it's given.

  3) parse_html_(string|fh|file), libxml2 will validate the html if you've
   got validation(1), However, it _still_ returns a dom!!  Thus LibXML
   thinks it succeeded and doesn't croak on the error messages.
   I added a test to parse_html_string to check if XML_error is non-empty
   and croak if so, but this seems clunky, so I didn't add it to the 
other 2.
   There ought to be a better way....

  4) I was trying to get error messages out of is_valid, which I guess 
is the
   wrong thing -- it should just say Yes or No.  However, there is an
   apparently not-yet-documented method validate() which is intended to
   do what I want.  However, both is_valid and validate are only setting
   the error and warning handlers if a DTD was given, which leads to 
(occasional)
   segfaults.

After fixing (3), another problem turned up, but I'm not so sure how to 
fix it.
My stylesheet has <xsl:output ... encoding='ISO-8859-1' ...
which produces in the html:
  <meta content="text/html; charset=ISO-8859-1" http-equiv="Content-Type">
which seems legit, and xmllint never complained.  However, using 
XML::LibXML,
I'm getting the error message:
  Entity: line 4: error: xmlSwitchEncoding : no input
<meta content="text/html; charset=ISO-8859-1" http-equiv="Content-Type">

Presumably it has something to do with LibXML storing everything as UTF-8
internally, and it can't/won't convert to ISO ??  Any suggestions?


Thanks

Bruce Miller wrote:

> Hi Matt and everybody;
>   This is really a question about XML::LibXML, rather than AxKit...
> [Is it OK to post it here?]
> 
> I can get access to well-formedness error messages by
>  my $parser=XML::LibXML->new();
>  eval { $doc=$parser->parse_string($xml); };
> and examining $@.
> 
> However, validation error messages _don't_ show up in $@
> after
>  eval { $valid = $doc->is_valid(); };
> Or anywhere else that I can determine!, although
> the boolean returned seems to be correct.
> 
> In the case of HTML,
>  eval { $hdoc=$parser->parse_html_string($html);
>         $valid=$hdoc->is_valid();
>   };
> seems to always claim the doc is invalid (although I'm convinced
> otherwise :>) [and occasionally even seg]
> 
> Are these capabilites known not to be (Yet ???) in XML::LibXML,
> or am I doing something wrong?
> [libxml2, via xmllint, can do them all, including validating html]
> 
> I'm using libxml 2.4.11, XML::LibXML 1.31, on RH Linux 7.1
> Thanks!
> bruce
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 

--- LibXML.xs.orig      Wed Dec  5 11:11:27 2001
+++ LibXML.xs   Wed Dec  5 15:28:58 2001
@@ -788,12 +788,14 @@
         int ret;
         xmlDocPtr real_dom;
         ProxyObject * proxy;
+        int checkvalid; 
     CODE:
         ptr = SvPV(string, len);
         if (len == 0) {
             croak("Empty string");
         }
-
+/* Save value, since it gets cleared by LibXML_cleanup_parser. */
+       checkvalid=xmlDoValidityCheckingDefaultValue;
         ctxt = xmlCreateMemoryParserCtxt(ptr, len);
         if (ctxt == NULL) {
             croak("Couldn't create memory parser context: %s", strerror(errno));
@@ -823,16 +825,19 @@
         real_dom = ctxt->myDoc;
         xmlFreeParserCtxt(ctxt);
         sv_2mortal(LibXML_error);
-        if (!well_formed || (xmlDoValidityCheckingDefaultValue && !valid)) {
+        if (!well_formed || (checkvalid && !valid)) {
             xmlFreeDoc(real_dom);
             RETVAL = &PL_sv_undef;    
             croak(SvPV(LibXML_error, len));
         }
         else {
             STRLEN n_a;
-            SV * newURI = newSVpvf("unknown-%12.12d", real_dom);
-            real_dom->URL = xmlStrdup(SvPV(newURI, n_a));
-            SvREFCNT_dec(newURI);
+           if(directory != NULL){
+             real_dom->URL = xmlStrdup(directory);
+           } else {
+             SV * newURI= newSVpvf("unknown-%12.12d", real_dom);
+             real_dom->URL = xmlStrdup(SvPV(newURI, n_a));
+             SvREFCNT_dec(newURI); }
             proxy = make_proxy_node( (xmlNodePtr)real_dom ); 
             RETVAL = sv_newmortal();
             sv_setref_pv( RETVAL, (char *)CLASS, (void*)proxy );
@@ -869,9 +874,12 @@
         }
         else {
             STRLEN n_a;
-            SV * newURI = newSVpvf("unknown-%12.12d", real_dom);
-            real_dom->URL = xmlStrdup(SvPV(newURI, n_a));
-            SvREFCNT_dec(newURI);
+           if(directory != NULL){
+             real_dom->URL = xmlStrdup(directory);
+           } else {
+             SV * newURI= newSVpvf("unknown-%12.12d", real_dom);
+             real_dom->URL = xmlStrdup(SvPV(newURI, n_a));
+             SvREFCNT_dec(newURI); }
             proxy = make_proxy_node( (xmlNodePtr)real_dom ); 
 
             RETVAL = sv_newmortal();
@@ -963,7 +971,8 @@
 
         sv_2mortal(LibXML_error);
         
-        if (!real_dom) {
+        ptr=SvPV(LibXML_error,len);
+        if (!real_dom || (*ptr!='\0')) {
             RETVAL = &PL_sv_undef;    
             croak(SvPV(LibXML_error, len));
         }
@@ -1173,6 +1182,10 @@
         SV * dtd_sv;
     CODE:
         LibXML_error = sv_2mortal(newSVpv("", 0));
+        cvp.userData = (void*)PerlIO_stderr();
+        cvp.error = (xmlValidityErrorFunc)LibXML_validity_error;
+        cvp.warning = (xmlValidityWarningFunc)LibXML_validity_warning;
+
         if (items > 1) {
             dtd_sv = ST(1);
             if ( sv_isobject(dtd_sv) && (SvTYPE(SvRV(dtd_sv)) == SVt_PVMG) ) {
@@ -1184,14 +1197,12 @@
             else {
                 croak("is_valid: argument must be a DTD object");
             }
-            cvp.userData = (void*)PerlIO_stderr();
-            cvp.error = (xmlValidityErrorFunc)LibXML_validity_error;
-            cvp.warning = (xmlValidityWarningFunc)LibXML_validity_warning;
             RETVAL = xmlValidateDtd(&cvp, self, dtd);
         }
         else {
             RETVAL = xmlValidateDocument(&cvp, self);
         }
+
     OUTPUT:
         RETVAL
 
@@ -1205,7 +1216,10 @@
         SV * dtd_sv;
         STRLEN n_a;
     CODE:
-        LibXML_error = sv_2mortal(newSVpv("", 0));
+       LibXML_error = sv_2mortal(newSVpv("", 0));
+        cvp.userData = (void*)PerlIO_stderr();
+        cvp.error = (xmlValidityErrorFunc)LibXML_validity_error;
+        cvp.warning = (xmlValidityWarningFunc)LibXML_validity_warning;
         if (items > 1) {
             dtd_sv = ST(1);
             if ( sv_isobject(dtd_sv) && (SvTYPE(SvRV(dtd_sv)) == SVt_PVMG) ) {
@@ -1217,14 +1231,12 @@
             else {
                 croak("is_valid: argument must be a DTD object");
             }
-            cvp.userData = (void*)PerlIO_stderr();
-            cvp.error = (xmlValidityErrorFunc)LibXML_validity_error;
-            cvp.warning = (xmlValidityWarningFunc)LibXML_validity_warning;
             RETVAL = xmlValidateDtd(&cvp, self , dtd);
         }
         else {
             RETVAL = xmlValidateDocument(&cvp, self);
         }
+
         if (RETVAL == 0) {
             croak(SvPV(LibXML_error, n_a));
         }

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to