Gabriel Sechan wrote:
Because XML is the Emperor's New Clothes of computer technology. Some
people praise it to the heavans as the solution to everything. In
reality, it does next to nothing. We've had parser generators since the
70s, at least. And parsing is the *easy* part of dealing with data.
And we've had s-expressions for longer than that. So why are there so
many programs that *still* can't cope with any of following
"[EMAIL PROTECTED]&*()_+-={}[],.<>" in data? As a fun exercise, inject a NUL
character into *anything* going into C. Oops. Security hole. Or,
better yet, the number of programs that still have hard-coded string and
buffer lengths. Oops. Stack smash.
XML doesn't help at all with the hard part- acting on the parsed
tokens.
Actually, you missed something more fundamental. It forces programmers
to deal with "parsed tokens". That's a *big* step up for most
programmers who normally just "throw a couple regexps down".
XML is overkill. It tries to swat a mosquito with a sledgehammer. It
takes a problem that can generally solved in minutes, and gives you
hours of fun debugging XML code. I have *never* seen a problem solved
by XML that couldn't have been done just as easily- if not more so-
without it.
And yet, somehow, nobody ever *did* solve the problems better or more
easily. Before XML, everybody generated their own binary formats in
spite of the fact that a perfectly good standard existed (ASN.1). In
addition, nobody ever documented these formats, either.
Instead of having to waste time reverse engineering the format, XML puts
the format directly in front of the humans. Yes, normally you use the
machine; however, when you *need* to look at it by eye, you *can*.
Writing a parser for a spec like XML is hard- thats why most XML
parsers are buggy. Writing a parser for a small domain language is
quite easy. Its a simple state machine. For middle sized languages,
you have lex and yacc.
The fact that you recommend lex and yacc sums up the problem quite
nicely. There are many superior tools for parsing and lexing, and yet
nobody ever uses them.
At least with XML, those better tools get encoded behind the SAX/DOM APIs.
XML *forces* these morons to have to interface with a structured,
debugged parser. SAX and DOM have their faults, but at least they
No it doesn't. THey still use regexes as often as not, which is a bad
thing with XML, since XML is such a top heavy, corner-case ridden spec.
That isn't just XML. All formats eventually get corner cases.
And 90% of the time, this boundary case only exists in the parsers
mind. An additional 9% of the time, the corner case is due to XML
itself and not failing to follow the DTD/schema.
Actually, I find those percentages are about reversed, nowadays. YMMV
and all that.
In 99% of apps, internationalization is overkill. Unless a human is
meant to be editing the file (such as a config file), its just a waste
of CPU power and time.
The problem is that i18n can't be retrofitted easily. Either get it
right up front or suffer forevermore.
As for being liberal in what you expect being a bad thing- lets try an
experiment. For the next month, you can only go to webpages that are
WC3 validated, and who's servers put out perfect HTTP. Come back to us
with how many sites you visited. I'll be impressed if you could make
double digits.
There's a reason most real world programs are liberal with inputs- they
have to be. You can't expect the other guy to get his shit right,
especially if he's not employed with you. And failing to the end user
is not a good option, not when the error can be routed around.
Let's try a different experiment. Let's let TCP fill in zeros for
packets it knows the size of but can't finish receiving and see how many
files you get transferred.
I get to single digits but I *know* my files and web pages are correct.
You wind up with randomly corrupted files.
TCP demands *perfect* packets for a reason.
Being liberal is *not* always a good thing. Being pedantic is necessary
when doing data interchange.
The only reason why being liberal with HTML works is that the end
consumer (a human) is doing error correction in the wetware using the
redundant information of language.
And you could have saved yourself a lot of work in 99% of cases by not
using XML, and not having to worry about the nasty gnarly stuff at all.
Just write a language that does what you need, no more no less.
And my solution eventually evolves to be just as complex as as XML. I'd
rather start at the debugged endpoint, thanks.
-a
--
[email protected]
http://www.kernel-panic.org/cgi-bin/mailman/listinfo/kplug-lpsg