To add my two cents to this discussion:  While I have my doubts that any
single parser design could fit the needs of more than maybe half (being
generous) of the possible applications out there, it's still a worthwhile
project.

I would avoid producing output in any format which needs further text
parsing, such as a text or XML format.  Which leaves a C structure-based
design a good Least Common Denominator -- most languages can access C
structures in some fashion.

But the problem with C structures is that they aren't easily extendible,
which makes it difficult to add features and change things.  Another huge
problem with C structures is that if they contain pointers, those pointers
become invalid when the structure is moved through memory.  (Such as might
happen on some platforms inter-process communications, or when moved to
another language.)

However, with proper planning, you can make the returned format highly
extensible and highly portable by taking a couple of steps early on:


1) Your parsing routine should return the data as a single linear buffer.
While parsing, you can keep everything in separately allocated structures,
but at the very end, when all the data is in structure form, you should
calculate the size of the final output buffer, allocate space for it, and
copy all the structure data into it.


2) There can be NO pointers in the buffer.  Instead, you can specify offset
values from the start of the buffer, which will serve the same purpose.
Those values can be declared in structures as a pointer and used as such
right up to the point where the structure is copied into the output buffer.

Use this in combination with #1, and you become pretty much language
agnostic.  You can even turn the parser into a UNIX filter tool, or save the
buffer as a file.


3) Place, at the beginning of each structure definition, a trio of variables
which identify the type of structure, length of the structure, and a general
purpose linked list pointer/offset value.  This gives you extensibility --
the caller can simply ignore any structure it encounters whose type it
doesn't know, and you can add fields to a structure later which callers can
check for if the length is long enough.

(In the past, I used to put a version number in the structure instead of a
length, but I found that only rarely would I revise a structure -- I'd more
often add a new structure type -- and almost always the revision would be to
make it longer, so having the length was more convenient, especially when it
came down to calculating the buffer size.)


4) Rather than putting a lot of data into a single large structure, break it
up into many smaller chunks of data, each in it's own structure, in a linked
list.  This, in combination with #3, gives you a whole lot of flexibility
and extensibility.

(For example, all your tune background info -- history, author, copyright,
discography, origin, etc... -- could be in a single linked list of
structures pointed to by an "info" pointer/offset field in the Tune
structure.  And down the road, if someone, say, added an ABC field for
Similar Tunes, you could just create another structure type and stick a copy
of that structure in the "info" linked list.)


5) Avoid arrays of structures, or anything which assumes that structure Y
follows structure X in the buffer.  Every bit of data should be accessible
by following a pointer/offset chain down from the first structure.   Use the
general purpose linked list pointer to group things together, and
pointer/offsets inside the structures to point to other linked lists.  You
can use arrays of pointers/offsets as well, but the array should be fixed
length.


I've done this kind of thing on a number of projects (not ABC related), and
it's always paid off nicely.  I'm not saying you have to do it this way, but
if you do, you will maximize your portability, and gain a lot of
flexibility.

-->Steve Bennett

To subscribe/unsubscribe, point your browser to: http://www.tullochgorm.com/lists.html

Reply via email to