To add my two cents to this discussion: While I have my doubts that any single parser design could fit the needs of more than maybe half (being generous) of the possible applications out there, it's still a worthwhile project.
I would avoid producing output in any format which needs further text parsing, such as a text or XML format. Which leaves a C structure-based design a good Least Common Denominator -- most languages can access C structures in some fashion. But the problem with C structures is that they aren't easily extendible, which makes it difficult to add features and change things. Another huge problem with C structures is that if they contain pointers, those pointers become invalid when the structure is moved through memory. (Such as might happen on some platforms inter-process communications, or when moved to another language.) However, with proper planning, you can make the returned format highly extensible and highly portable by taking a couple of steps early on: 1) Your parsing routine should return the data as a single linear buffer. While parsing, you can keep everything in separately allocated structures, but at the very end, when all the data is in structure form, you should calculate the size of the final output buffer, allocate space for it, and copy all the structure data into it. 2) There can be NO pointers in the buffer. Instead, you can specify offset values from the start of the buffer, which will serve the same purpose. Those values can be declared in structures as a pointer and used as such right up to the point where the structure is copied into the output buffer. Use this in combination with #1, and you become pretty much language agnostic. You can even turn the parser into a UNIX filter tool, or save the buffer as a file. 3) Place, at the beginning of each structure definition, a trio of variables which identify the type of structure, length of the structure, and a general purpose linked list pointer/offset value. This gives you extensibility -- the caller can simply ignore any structure it encounters whose type it doesn't know, and you can add fields to a structure later which callers can check for if the length is long enough. (In the past, I used to put a version number in the structure instead of a length, but I found that only rarely would I revise a structure -- I'd more often add a new structure type -- and almost always the revision would be to make it longer, so having the length was more convenient, especially when it came down to calculating the buffer size.) 4) Rather than putting a lot of data into a single large structure, break it up into many smaller chunks of data, each in it's own structure, in a linked list. This, in combination with #3, gives you a whole lot of flexibility and extensibility. (For example, all your tune background info -- history, author, copyright, discography, origin, etc... -- could be in a single linked list of structures pointed to by an "info" pointer/offset field in the Tune structure. And down the road, if someone, say, added an ABC field for Similar Tunes, you could just create another structure type and stick a copy of that structure in the "info" linked list.) 5) Avoid arrays of structures, or anything which assumes that structure Y follows structure X in the buffer. Every bit of data should be accessible by following a pointer/offset chain down from the first structure. Use the general purpose linked list pointer to group things together, and pointer/offsets inside the structures to point to other linked lists. You can use arrays of pointers/offsets as well, but the array should be fixed length. I've done this kind of thing on a number of projects (not ABC related), and it's always paid off nicely. I'm not saying you have to do it this way, but if you do, you will maximize your portability, and gain a lot of flexibility. -->Steve Bennett To subscribe/unsubscribe, point your browser to: http://www.tullochgorm.com/lists.html
