So, Google has decided to "share" their "replacement" for XML:
http://www.betanews.com/article/Google_releases_its_data_encoding_format_to_compete_with_XML/1215530589
http://google-opensource.blogspot.com/2008/07/protocol-buffers-googles-data.html
Now, I'm no fan of XML. I loathe XML. XML is only "self-describing" in
the sense that I can read it by hand and generally puzzle out how to
write something to munge it.
So, Google's innovation is that you create a "description file" and then
"compile it" to a class.
So, a fast self-describing wire format was simply too much for the
geniuses at Google? They wrote an IDL?
Now, maybe Google really can't afford the extra translation step that
you need to go through. They have strict performance issues. And they
use C++, so you would need an "unpack the protocol into a class" step
that dynamic languages like Java and Python don't.
You know what? I doubt it. I've seen NIH before, and this *stinks* of
NIH. Especially with quotes like:
Do we write hand-coded parsing and serialization routines for each
data structure? Well, we used to. Needless to say, that didn't last
long. When you have tens of thousands of different structures in your
code base that need their own serialization formats, you simply
cannot write them all by hand.
*boggles*.
The biggest problem is simply that you now have a "stored blob of data"
separated from "how to interpret that stored blob of data". Ask NASA
how well that works out over the long term. A hint: it doesn't. They
have lots of tapes with no way to interpret them because they can't even
tell what kind of numbers are on them.
Humans are pretty good at puzzling out things if they know stuff like
"That's a char. That's a string. That's an array of doubles." When
you don't even have that, you are *hosed*.
This is, IMO, the *SINGLE* advantage that XML brings to the part.
Crappy as it is, almost everybody agreed to use it, and you can see
where the delimiters are. Oh, and they also handled a few tiny problems
like escape quoting and foreign characters. But let's not get too advanced.
It gets worse:
But, IDLs in general have earned a reputation for being hopelessly
complicated. On the other hand, one of Protocol Buffers' major design
goals is simplicity.
The moment I hear: "Our goal is simplicity" for a well-understood,
well-trodden area I hear:
"We don't feel like taking the time to understand *why* all those other
people did complicated stuff in functional, portable, debugged
libraries. And, we're geniuses, so we'll roll our own. Because writing
your own code is more fun than understanding someone else's. Oh, and
our stuff will have lots of bugs, get nice and complicated as we either
add those other features or develop horrible hacks to work around the
limitations."
There exist a ton of self-describing formats nowadays. ASN.1 is pretty
compact, last I checked. JSON (Javascript Object Notation) isn't bad.
There are *MANY* others.
The worst part of this is that there are 10 gazillion programmers going
"Oh, if Google uses it, it must be the thing to use.".
You know, we just got world+dog to start using hash tables wherein you
actually give the thing you want to store a name rather than a number or
a memory location. It makes it easy to put into "classes", and
sometimes you don't even do that because *gasp* you can access it by
name *anyhow*.
Now Google is setting us back 20 years. Sigh.
-a
--
KPLUG-LPSG@kernel-panic.org
http://www.kernel-panic.org/cgi-bin/mailman/listinfo/kplug-lpsg