Chas Emerick wrote: >> and keep patting our- >> selves on the back, while the rest of the world is busy routing around >> us, switching to well-understood XML subsets or other serialization >> formats, simpler and more flexible data models, simpler API:s, and >> more robust code. and Python ;-) > > That's flatly unrealistic. If you'll remember, I'm not one of "those > people" that are specification-driven -- I hadn't even *heard* of > Infoset until earlier this week!
The rant wasn't directed at you or anyone special, but I don't really think you got the point of it either. Which is a bit strange, because it sounded like you *were* working on extracting information from messy documents, so the "it's about the data, dammit" way of thinking shouldn't be news to you. And the routing around is not unrealistic, it's is a *fact*; JSON and POX are killing the full XML/Schema/SOAP stack for communication, XHTML is pretty much dead as a wire format, people are apologizing in public for their use of SOAP, AJAX is quickly turning into AJAJ, few people care about the more obscure details of the XML 1.0 standard (when did you last see a conditional section? or even a DTD?), dealing with huge XML data sets is still extremely hard compared to just uploading the darn thing to a database and doing the crunching in SQL, and nobody uses XML 1.1 for anything. Practicality beats purity, and the Internet routes around damage, every single time. > overwhelming majority of the developers out there care for nothing > but the serialization, simply because that's how one plays nicely > with others. The problem is if you only stare at the serialization, your code *won't* play nicely with others. At the serialization level, it's easy to think that CDATA sections are different from other text, that character references are different from ordinary characters, that you should somehow be able to distinguish between <tag></tag> and <tag/>, that namespace prefixes are more important than the namespace URI, that an in an XHTML-style stream is different from a U+00A0 character in memory, and so on. In my experience, serialization-only thinking (at the receiving end) is the single most common cause for interoperability problems when it comes to general XML interchange. But when you focus on the data model, and treat the serialization as an implementation detail, to be addressed by a library written by someone who's actually read the specifications a few more times than you have, all those problems tend to just go away. Things just work. And in practice, of course, most software engineers understand this, and care about this. After all, good software engineering is about abstractions and decoupling and designing things so you can focus on one part of the problem at a time. And about making your customer happy, and having fun while doing that. Not staying up all night to look for an obscure interoperability problem that you finally discover is caused by someone using a CDATA section where you expected a character reference, in 0.1% of all production records, but in none of the files in your test data set. (By the way, did ET fail to *read* your XML documents? I thought your complaint was that it didn't put the things it read in a place where you expected them to be, and that you didn't have time to learn how to deal with that because you had more important things to do, at the time?) </F> -- http://mail.python.org/mailman/listinfo/python-list