Re: [should be Re: JSON]

Andrew Lentvorski Tue, 25 Oct 2005 20:09:13 -0700

Christopher Smith wrote:

I would argue you can cope with the idiocy regardless. However, it is
fair to say that XML does provide some buffers against certain types of
idiocy. That said, there are several other approaches which do a much
better job of buffering idiocy.

Maybe. But writing a program to eat "HSPICE Data Output File" format isa lot easier when it is in undocumented XML that when it is inundocumented binary. The simple addition of knowing where thedelimiters are helps tremendously.

Actually, this is one of my interview questions for VLSI CAD tool
administrators.  I give them a structured file (slightly modified Spice
simulator input deck) and ask them to write code to cope with it.  Those
who use regexes fail--they invariably have silent failure modes (*very*
bad when your script may be a check which has to prevent a $1 million
mistake).



It's weird, any regexp library I've seen has a "match" operation that
can and does fail when it doesn't get a match. That said, trying to use
a regexp to parse a file format is an incredible pain to get right.

That problem is rarely failing to get a match. It will be getting afalse match.

Regardless of whether it's written in XML, you can write a grammar for
what you perceive is this undocumented format and use it to validate
data. Unfortunately, much as with a DTD or Schema that's created in a
black box scenario, you might end up with some false negatives.

For binary formats, that's not really true. Writing a validator forbinary formats means bugs in the "schema" as well as the "validator".

I've had to try to get Unicode data moving back and forth between Oracle
databases, message queuing software, and tools written in different
languages, and it's been my experience each transition for one tool to
another involved some lovely compatibility issues, often even when just
using the old character-set world would have made it pretty easy.


Oh, yeah.  Fortunately, most things are now speaking UTF-8.

I am surprised that there isn't a Boost substitute for C++ String thatis fully Unicode compliant. C++ STL String has lots of idiocies.


<googles>

Yecch.  It looks like C++ hasn't made any progress.

If done properly, it's possible to have a sufficiently large data set
that can't be rendered into a DOM on a 32-bit machine, but can be
rendered in a parse tree for a much more well defined format.

Polygon information normally can't be rendered into a DOM format anyway.Accessing polygons hierarchically rather than spatially makes verylittle sense.

Again, I think you give XML too much credit. If you want to design a
format that is extensible, it's not hard to do it.

Actually, it is. Your parser has to parse generally. Most people whodesign a format invariably create a parser with specific assumptionsbecause "it's easier". Later, they can't change that because "we haveall this existing data".

Using XML forces the use of general parsing early on. Especially sincesmall parsing jobs tend to use DOM to start since it can normally berendered directly into an in-memory, tree data structure.

Yes, the key advantage of XML is the idiot buy-in factor. If you did
s/XML/ASN.1/, they'd say you were being rediculous. Unfortunately,
whenever you try to make something idiot proof, they just build a better
idiot, which is exactly what happens when you pressure someone to
provide an XML interface to their tool. :-(

The main problem there is the fact that the vendors try to activelysabotage interchange because dumping your data allows you to changevendors. In the VLSI design industry, the vendors subotaged EDIFgenerators just like they now sabotage their XML generators.


-a

--
[email protected]
http://www.kernel-panic.org/cgi-bin/mailman/listinfo/kplug-lpsg

Re: [should be Re: JSON]

Reply via email to