* Keith C. Ivey <[EMAIL PROTECTED]> [2003-07-15 14:42]: > which will be handled by the regex but may cause a parser to > blow up (though some are more tolerant than others)
Did you read what I said? You need a tolerant parser indeed. Did you take any look HTML::Parser at all? > | That leaves input data munging, which I do a lot of, and a > | lot of input data these days is XML. Now here's the dirty > | secret; most of it is machine-generated XML, Is yours? > | I've even gone to the length of writing a prefilter to glue > | together tags that got split across multiple lines, just so I > | could do the regexp trick. Do you? Sure, you as long as you know your input follows narrower specifications then "arbitrary valid markup", you can use that knowledge to your advantage. The deficiencies with parsers are their interfaces; what we really need is a generic matching engine that can be applied to ordered collections not only of characters, but of arbitrary objects for some, so that we could apply a pattern to, say, a stream of XML parser events. -- Regards, Aristotle
