Re: Google NIH generates yet another incompatible data transfer language

Darren New Thu, 10 Jul 2008 09:03:26 -0700

Andrew Lentvorski wrote:

Except that it doesn't look like they even *thought* about ASN.1. Theyjust thought about how they were abusing XML.

True. There are a lot of good protocol mechanisms in there. My point wasmore that XML has an awful lot of overhead if what you want is to shiptagged bytes around. That overhead comes from using XML to ship taggedbytes around instead of shipping marked up text around.

Yes and no. That's true if everything is just internal to your ownprogram. However, once you start dumping data into a generalizedpersistent store (eg. BigTable), that's could be the difference betweenterabytes of dead data and data that's useful because nobody canremember what program stuffed all that data there.


You could use UBF, where the data is the program that creates the data.

http://armstrongonsoftware.blogspot.com/2008/07/ubf-and-vm-opcocde-design.html

But yeah, ASN.1 has a premise that you're actually using it to describestandard data structures, so it needs to be documented in (say) CCITT orISO standards. You can parse (most) ASN.1 without the description ofwhat it is. Using XML doesn't really save you unless you're smart enoughto use good tags.

Show me how to interpret XHTML without knowing the standard. Show mywhat <data><flag>UP</flag></data> is supposed to mean.


Knowing how the data is formatted is pretty independent of the container.

In addition, it loses the inline association between labeled delimiter and 
delimited data.  That's a large loss that many people won't think about.

Maybe I've just deal with stupid people too much, but my experience isthat "we use XML" means "we don't have to document what are data meansbecause we can just hand you an example and hope you intuit that 'ID'isn't actually the primary key of the record, etc." Much the same waythat "the documentation is on the wiki" really means "we have ourcustomers try to reverse-engineer our system because we don't designanything up front."

XML handles a lot of things *right*. Unicode is good.

That's only necessary for a system designed primarily to handle largechunks of text. Any counted format will handle unicode just fine if yourlanguage does. If your language doesn't handle unicode cleanly, neitherwill the XML library in that language.

Named closing delimiters have some nice advantages.

Assuming you're not using any tools beyond a text editor to look at thedata, sure, perhaps.

It handles tree structures from the start.

It's worse. It handles <trees> with nodes <stuck>stuck</stuck> in themiddle of other </trees>.

It handles character escaping correctly--a task which *everyone* seems to get wrong in the first 10 versions of a format.

Again, assuming you're using a delimited format rather than a countedformat. And lots of people manage to get the escaping right. XML'sescaping is just uglier because it's designed to have text nodes thatlook like tree nodes.

Oh, and I've interfaced to plenty of XML implementations that don't getthe escaping right. XML only gets it right when the person implementingit uses an XML library that's done right, instead of hand-rolling aone-off because they don't know any better.

The XML API's actually do most of the hard parsing garbage for you. XML API's exist in practically every language of any usefulness. And everybody now seems to use XML by default.


Yeah. These are actually real advantages, I'll grant you. :-)

I think the real difference is that you don't have to actually have thespec there to parse XML. That is, you don't really need to configure thelibrary to do the parsing. You can just hand a blob of XML to a libraryand get back a tree. There are lots of formats that work this way thesedays, tho. The old stuff, like Sun's RPC encoding, was awful in thisrespect. But JSON, ASN.1, etc all handle parsing without a specunderneath. (Of course, you have to avoid IMPLICIT in your ASN.1)


--
Darren New / San Diego, CA, USA (PST)
 Helpful housekeeping hints:
  Check your feather pillows for holes
   before putting them in the washing machine.

--
[email protected]
http://www.kernel-panic.org/cgi-bin/mailman/listinfo/kplug-lpsg

Re: Google NIH generates yet another incompatible data transfer language

Reply via email to