Gabriel Sechan wrote:

Because XML is the Emperor's New Clothes of computer technology. Some people praise it to the heavans as the solution to everything. In reality, it does next to nothing. We've had parser generators since the 70s, at least. And parsing is the *easy* part of dealing with data.

And we've had s-expressions for longer than that. So why are there so many programs that *still* can't cope with any of following "[EMAIL PROTECTED]&*()_+-={}[],.<>" in data? As a fun exercise, inject a NUL character into *anything* going into C. Oops. Security hole. Or, better yet, the number of programs that still have hard-coded string and buffer lengths. Oops. Stack smash.

XML doesn't help at all with the hard part- acting on the parsed tokens.

Actually, you missed something more fundamental. It forces programmers to deal with "parsed tokens". That's a *big* step up for most programmers who normally just "throw a couple regexps down".

XML is overkill. It tries to swat a mosquito with a sledgehammer. It takes a problem that can generally solved in minutes, and gives you hours of fun debugging XML code. I have *never* seen a problem solved by XML that couldn't have been done just as easily- if not more so- without it.

And yet, somehow, nobody ever *did* solve the problems better or more easily. Before XML, everybody generated their own binary formats in spite of the fact that a perfectly good standard existed (ASN.1). In addition, nobody ever documented these formats, either.

Instead of having to waste time reverse engineering the format, XML puts the format directly in front of the humans. Yes, normally you use the machine; however, when you *need* to look at it by eye, you *can*.

Writing a parser for a spec like XML is hard- thats why most XML parsers are buggy. Writing a parser for a small domain language is quite easy. Its a simple state machine. For middle sized languages, you have lex and yacc.

The fact that you recommend lex and yacc sums up the problem quite nicely. There are many superior tools for parsing and lexing, and yet nobody ever uses them.

At least with XML, those better tools get encoded behind the SAX/DOM APIs.

XML *forces* these morons to have to interface with a structured, debugged parser. SAX and DOM have their faults, but at least they


No it doesn't. THey still use regexes as often as not, which is a bad thing with XML, since XML is such a top heavy, corner-case ridden spec.

That isn't just XML.  All formats eventually get corner cases.

And 90% of the time, this boundary case only exists in the parsers mind. An additional 9% of the time, the corner case is due to XML itself and not failing to follow the DTD/schema.

Actually, I find those percentages are about reversed, nowadays. YMMV and all that.

In 99% of apps, internationalization is overkill. Unless a human is meant to be editing the file (such as a config file), its just a waste of CPU power and time.

The problem is that i18n can't be retrofitted easily. Either get it right up front or suffer forevermore.

As for being liberal in what you expect being a bad thing- lets try an experiment. For the next month, you can only go to webpages that are WC3 validated, and who's servers put out perfect HTTP. Come back to us with how many sites you visited. I'll be impressed if you could make double digits.

There's a reason most real world programs are liberal with inputs- they have to be. You can't expect the other guy to get his shit right, especially if he's not employed with you. And failing to the end user is not a good option, not when the error can be routed around.

Let's try a different experiment. Let's let TCP fill in zeros for packets it knows the size of but can't finish receiving and see how many files you get transferred.

I get to single digits but I *know* my files and web pages are correct. You wind up with randomly corrupted files.

TCP demands *perfect* packets for a reason.

Being liberal is *not* always a good thing. Being pedantic is necessary when doing data interchange.

The only reason why being liberal with HTML works is that the end consumer (a human) is doing error correction in the wetware using the redundant information of language.

And you could have saved yourself a lot of work in 99% of cases by not using XML, and not having to worry about the nasty gnarly stuff at all. Just write a language that does what you need, no more no less.

And my solution eventually evolves to be just as complex as as XML. I'd rather start at the debugged endpoint, thanks.

-a

--
[email protected]
http://www.kernel-panic.org/cgi-bin/mailman/listinfo/kplug-lpsg

Reply via email to