Re: [Chicken-users] handling XML

Peter Bex Wed, 13 Mar 2013 11:12:28 -0700

On Wed, Mar 13, 2013 at 04:31:17PM +0000, Ivan Shmakov wrote:
>       One of the
>       flaws of the current algorithm is that it's “code-driven,” and
>       not “data-driven”: instead of walking over the XML template and
>       invoking the code as soon as a “marked” element is found — think
>       of (xml-map (lambda (node) (cond (… (else node)))) dom) — or
>       perhaps similarly with mutation — the code explicitly /searches/
>       for the elements matching certain XPath expressions (hardcoded,
>       too), one by one, and performs the respective edits to the tree.


"That's awful!" ;)

>       … BTW, is there a libxml binding for Chicken?  Unless I be
>       mistaken, the library is widely considered to be the fastest XML
>       engine currently in existence.

There is no binding for it.  There used to be one for expat.  I've done
benchmarks of various XML libraries once and SSAX was only slightly
slower than the C ones (at least, compared to Ruby's expat and libxml
C bindings and expat in Chicken).  Oleg also explained to me that he's
using SSAX in production and its performance is always more than enough.

So I'd suggest to use SSAX unless you run into such an insane bottleneck
that you have to use something written in C or you need special features
which SSAX doesn't offer.  Writing the bindings is not worth the effort,
otherwise.  AFAICT SSAX doesn't have the entity expansion and external
reference vulnerabilities found recently in various other libraries.

>  > The disadvantage of that notation is of course the opportunity for
>  > generating ill-formed XML, unless you run some kind of parsing step
>  > over it and raise an error as soon as you encounter bad
>  > nesting/syntax.
> 
>       It's my understanding that a standards-compliant XML parser is
>       /required/ to raise an error given an XML document that isn't
>       well-formed.  (Obviously, it doesn't make a sense to claim that
>       an “XML-based processing” is in use, unless the input XML
>       document is indeed parsed first.)

Right, that's why I wanted to express this concern.  With proper parsing
there might be a bit more overhead, but it would definitely make things
easier to debug.

Even yesterday, at work we had a bug with our PHP codebase (I know, I know),
which only manifested in IE7 (not 8 or 9).  Turned out that a small typo
in a </p> closing tag so it read <p> instead caused IE to not see
*one input* (the last one, IIRC) of the form which followed, which was
inside a table.  That's got to be the hairiest validation-related problem
I've seen to date.  By now I've seen enough of those stupid problems (and
the time it takes to debug, even though I've become quite good at that,
is just more wasted time).  Raising an exception immediately on reading
that ill-formed tag would've caught it much earlier and saved us some
grief (we only found out about the bug after several would-be customers
complained that they couldn't fill out the form because the CAPTCHA never
validated correctly!)

Sorry for the war story :)

Cheers,
Peter
-- 
http://www.more-magic.net

_______________________________________________
Chicken-users mailing list
[email protected]
https://lists.nongnu.org/mailman/listinfo/chicken-users

Re: [Chicken-users] handling XML

Reply via email to