From: Andrew Lentvorski <[EMAIL PROTECTED]>
Everybody whines about XML. I don't understand why.
Because XML is the Emperor's New Clothes of computer technology. Some
people praise it to the heavans as the solution to everything. In reality,
it does next to nothing. We've had parser generators since the 70s, at
least. And parsing is the *easy* part of dealing with data. XML doesn't
help at all with the hard part- acting on the parsed tokens. So in
exchange for doing a minimal amount of work for you, you get a bloated, CPU
and bandwidth wasting format with a huge annoyingly overengineered spec and
a slow ass parser. Oh, and you get to throw in XML on your buzzword page.
XML is overkill. It tries to swat a mosquito with a sledgehammer. It takes
a problem that can generally solved in minutes, and gives you hours of fun
debugging XML code. I have *never* seen a problem solved by XML that
couldn't have been done just as easily- if not more so- without it.
Oh and for those who scream that XML is magicly open and anyone can now read
any format- no you can't. Its just as easy to come up with a convoluted,
proprietary schema as it is any other format. You buy yourself nothing
there.
Actually, I do:
First, parsers are *hard*. Every idiot CS major thinks he a can write a
parser for his "little language". They are all wrong.
Writing a parser for a spec like XML is hard- thats why most XML parsers
are buggy. Writing a parser for a small domain language is quite easy. Its
a simple state machine. For middle sized languages, you have lex and yacc.
The hard part of parsing data isn't the parsing- its dealing with the
tokens after its parsed, and designing a good language to begin with. XML
helps neither of those activities. You still need to deal with the tokens,
you still need to design a good schema. The second one perhaps being the
biggest problem- when you see buggy non-XML parsers, chances are the
language spec is too convoluted. Of course, if they changed it to XML tags
it wouldn't be magicly better- you'd still have a convoluter schema, wrapped
in tags.
XML *forces* these morons to have to interface with a structured, debugged
parser. SAX and DOM have their faults, but at least they
No it doesn't. THey still use regexes as often as not, which is a bad thing
with XML, since XML is such a top heavy, corner-case ridden spec.
get debugged. Watching programmers writhe in agony because the XML parser
threw an exception on a boundary case that their puny little minds are too
narrow to anticipate is a most rewarding experience.
And 90% of the time, this boundary case only exists in the parsers mind. An
additional 9% of the time, the corner case is due to XML itself and not
failing to follow the DTD/schema.
Second, internationalization is hard. How many ways are there to spell
Tchaikovsky? The same morons from above get *forced* into dealing with
this kind of crud with XML when they bump into another program which
refuses to accept that Author, Composer, etc is a unique key. Oops. And
the whole fact that XML *specifies* Unicode is beautiful--no more slacking
off and only accepting ASCII or, worse, only accepting letters and digits.
In 99% of apps, internationalization is overkill. Unless a human is meant
to be editing the file (such as a config file), its just a waste of CPU
power and time.
Third, XML parsers *complain* when you feed them garbage. If you don't get
your formatting and nesting correct, most XML parsers are free to dump your
crud into the bitbucket any way they please.
Yup, because just dumping the doc rather than trying to route around the
problem is a great idea. Nah, I didn't really want all that data. So whats
a few missing bank transactions gonna cost anyway?
And herein lies the source of the XML verbosity that everybody complains
about--balanced close tags. Syntax errors almost always *immediately*
cause parsing errors because they tend to bump into unbalanced tags; no
silent degradation here--I approve.
Except they end up just using the <tagname /> syntax. Oops, now you have
spelling errors and annoying syntax.
The same nitwits who think they can write parsers and can't deal with the
fact that almost nothing in real life is a useful unique key desperately
want XML parsers to be "liberal in what they accept" so that they don't
have to debug their XML generation code. Hogwash! Clap them in irons for
promulgating their dreck amongst the public!
I can think of plenty of things that are useful unique keys in real life.
Space-time co-ordinats, SSNs, license plates, book title and author, etc.
As for being liberal in what you expect being a bad thing- lets try an
experiment. For the next month, you can only go to webpages that are WC3
validated, and who's servers put out perfect HTTP. Come back to us with how
many sites you visited. I'll be impressed if you could make double digits.
There's a reason most real world programs are liberal with inputs- they
have to be. You can't expect the other guy to get his shit right,
especially if he's not employed with you. And failing to the end user is
not a good option, not when the error can be routed around.
I will happily accept the restrictions that XML places upon me because *I
don't find them to be restrictions*. I wind up putting in the work to deal
with this kind of stuff anyway. I can avoid most of the gnarly, nasty
corners of XML (namespaces and schemas/DTD's) while still retaining most of
the advantages all while knowing that the gnarly, nasty stuff is available
if I really need it.
And you could have saved yourself a lot of work in 99% of cases by not using
XML, and not having to worry about the nasty gnarly stuff at all. Just
write a language that does what you need, no more no less.
Gabe
--
[email protected]
http://www.kernel-panic.org/cgi-bin/mailman/listinfo/kplug-lpsg