From: Andrew Lentvorski <[EMAIL PROTECTED]>
Everybody whines about XML.  I don't understand why.


Because XML is the Emperor's New Clothes of computer technology. Some people praise it to the heavans as the solution to everything. In reality, it does next to nothing. We've had parser generators since the 70s, at least. And parsing is the *easy* part of dealing with data. XML doesn't help at all with the hard part- acting on the parsed tokens. So in exchange for doing a minimal amount of work for you, you get a bloated, CPU and bandwidth wasting format with a huge annoyingly overengineered spec and a slow ass parser. Oh, and you get to throw in XML on your buzzword page.

XML is overkill. It tries to swat a mosquito with a sledgehammer. It takes a problem that can generally solved in minutes, and gives you hours of fun debugging XML code. I have *never* seen a problem solved by XML that couldn't have been done just as easily- if not more so- without it.

Oh and for those who scream that XML is magicly open and anyone can now read any format- no you can't. Its just as easy to come up with a convoluted, proprietary schema as it is any other format. You buy yourself nothing there.



Actually, I do:

First, parsers are *hard*. Every idiot CS major thinks he a can write a parser for his "little language". They are all wrong.

Writing a parser for a spec like XML is hard- thats why most XML parsers are buggy. Writing a parser for a small domain language is quite easy. Its a simple state machine. For middle sized languages, you have lex and yacc.

The hard part of parsing data isn't the parsing- its dealing with the tokens after its parsed, and designing a good language to begin with. XML helps neither of those activities. You still need to deal with the tokens, you still need to design a good schema. The second one perhaps being the biggest problem- when you see buggy non-XML parsers, chances are the language spec is too convoluted. Of course, if they changed it to XML tags it wouldn't be magicly better- you'd still have a convoluter schema, wrapped in tags.


XML *forces* these morons to have to interface with a structured, debugged parser. SAX and DOM have their faults, but at least they

No it doesn't. THey still use regexes as often as not, which is a bad thing with XML, since XML is such a top heavy, corner-case ridden spec.

get debugged. Watching programmers writhe in agony because the XML parser threw an exception on a boundary case that their puny little minds are too narrow to anticipate is a most rewarding experience.

And 90% of the time, this boundary case only exists in the parsers mind. An additional 9% of the time, the corner case is due to XML itself and not failing to follow the DTD/schema.

Second, internationalization is hard. How many ways are there to spell Tchaikovsky? The same morons from above get *forced* into dealing with this kind of crud with XML when they bump into another program which refuses to accept that Author, Composer, etc is a unique key. Oops. And the whole fact that XML *specifies* Unicode is beautiful--no more slacking off and only accepting ASCII or, worse, only accepting letters and digits.

In 99% of apps, internationalization is overkill. Unless a human is meant to be editing the file (such as a config file), its just a waste of CPU power and time.


Third, XML parsers *complain* when you feed them garbage. If you don't get your formatting and nesting correct, most XML parsers are free to dump your crud into the bitbucket any way they please.

Yup, because just dumping the doc rather than trying to route around the problem is a great idea. Nah, I didn't really want all that data. So whats a few missing bank transactions gonna cost anyway?

And herein lies the source of the XML verbosity that everybody complains about--balanced close tags. Syntax errors almost always *immediately* cause parsing errors because they tend to bump into unbalanced tags; no silent degradation here--I approve.

Except they end up just using the <tagname /> syntax. Oops, now you have spelling errors and annoying syntax.

The same nitwits who think they can write parsers and can't deal with the fact that almost nothing in real life is a useful unique key desperately want XML parsers to be "liberal in what they accept" so that they don't have to debug their XML generation code. Hogwash! Clap them in irons for promulgating their dreck amongst the public!

I can think of plenty of things that are useful unique keys in real life. Space-time co-ordinats, SSNs, license plates, book title and author, etc.

As for being liberal in what you expect being a bad thing- lets try an experiment. For the next month, you can only go to webpages that are WC3 validated, and who's servers put out perfect HTTP. Come back to us with how many sites you visited. I'll be impressed if you could make double digits.

There's a reason most real world programs are liberal with inputs- they have to be. You can't expect the other guy to get his shit right, especially if he's not employed with you. And failing to the end user is not a good option, not when the error can be routed around.


I will happily accept the restrictions that XML places upon me because *I don't find them to be restrictions*. I wind up putting in the work to deal with this kind of stuff anyway. I can avoid most of the gnarly, nasty corners of XML (namespaces and schemas/DTD's) while still retaining most of the advantages all while knowing that the gnarly, nasty stuff is available if I really need it.

And you could have saved yourself a lot of work in 99% of cases by not using XML, and not having to worry about the nasty gnarly stuff at all. Just write a language that does what you need, no more no less.

Gabe


--
[email protected]
http://www.kernel-panic.org/cgi-bin/mailman/listinfo/kplug-lpsg

Reply via email to