Re: [abcusers] ABCp output data structure

2004-09-10 Thread Bernard Hill
In message [EMAIL PROTECTED], Paul Rosen 
[EMAIL PROTECTED] writes
 as you might have read in other posts, I would be very interested in any
work on API for accessing ABC file once parsed. I still did not have a
clue
for creating one and I would welcome any suggestion! Just let me know when
you got an idea.
I would break the problem into two parts: first decide what data needs to be
represented, then figure out the physical layout.
Here's my first shot at a comprehensive description of the data:
Have a header section followed by a repeating field section.
The header section contains:
version - (probably either 1.6 or 2.0 or now)
tune number - int
title - arbitrary length string.
area - arbitrary length string.
book - arbitrary length string
composer - arbitrary length string.
discography - arbitrary length string.
elemskip - arbitrary length string.[What does this do?]
group - arbitrary length string.
history - arbitrary length string.
information - arbitrary length string.
notes - arbitrary length string.
origin - arbitrary length string.
source - arbitrary length string.
transcription notes - arbitrary length string.
rhythm - arbitrary length string.[can this be interpreted in any way?]
default length - double?
meter - double?
String. To include C, C| or ยข or 3+3+2/8 or 4 (which is the 
same as 4/1). And don't forget that C| is not always 2/2. It can be 
n/2 in musical terms.

tempo - [note length and beats per minute] double? and int
or string. To include allegro etc.
parts - array of bytes
starting key - enum
Going from what to what? Are you including minor keys?
Or is this just a list of accidentals? What about 
one-sharp-plus-one-flat (etc)

I would suggest an array of [-2..2]
[I'd suggest the following additional fields that aren't in the spec:
clef, copyright and additional lyrics]
Then the header section is followed by a set of repeating fields of one of
three types.
The types are: note element, bar element, formatting element, and header
element.
the HEADER element is one of:
key (as above)
elemskip [What does this do?]
key (as above)
default length (as above)
meter (as above)
part - byte
tempo (as above)
title (as above)
words [how is this supposed to look?]
The BAR element contains:
bar type - enum (single, repeat left, etc.)
start ending - bit field
ending number - int
end ending - bit field
What about fermata over a barline?
The FORMATTING element is one of:
End of line
break beam
How do you handle principal beams? ie
---
   
|   |  |   |   |  |
|   |  |   |   |  |
|   |  |   |   |  |
The NOTE element contains:
guitar chord unrecognized - arbitrary length string
guitar chord recognized - root pitch (enum), type (enum), base note (enum)
gracings - enum
bowing - enum
staccato/legato - enum
one or more stacked notes, containing:
   grace notes - array of pitches
Distinction between acciaccatura and appoggiatura?
How do you handle chords in grace notes?

   pitch - enum [includes an enum for a rest]
   length - int
what's the unit here?
   start tie - bit field
   end tie - bit field
   start slur - bit field
   end slur - bit field
What's a bit field? Is it just an array of 0..1? If so, I don't 
understand how a start tie et al is a bit field.

[I'd also recommend the following extension: an array of syllables to appear
as the lyrics under the note.]
[Also, can we add loudness, fermat, and start crescendo, end
crescendo, fingering, retard, a tempo, etc.?]
Fermat?
What about the articulations: accent (), tenuto, legato, staccato, 
staccatissimo, martellato and don't forget these can be at the stem end 
and the head end.

---
I think the above is fairly complete.
What about 1st/2nd/3rd time endings for repeats. Coda sections and DC 
al fine and such like.

What about the type of binding between staves making systems? ie 
brackets, simple line, brace, brace+bracket?

What about arpeggiando marks?
Caesura and commas.
Note size (cue sized, grace notes, normal).
Stave size (cue sized or normal)
Fingerings.
Now, to represent it is tougher to
allow ease of use in all programming languages.
No it's not. Make it strings.
The way I'd represent it without using objects is with a stream of variable
length fields. That is, there would be a series of [type length data]
elements.
The overhead would be:
element type - enum
length - byte on some element types, short on some element types, and not
present for some element types.
data - length bytes of data, interpreted differently for each element type.
The beginning of the structure would contain a 2-byte version number and a
4-byte total length, and possibly a signature.
In addition, we could have an array of indexes into the start of each
element. Perhaps another array of indexes into the start of each note
element, so that a MIDI program wouldn't have to wade through non-sounding
elements.
The bit fields would be combined in a byte when possible.
The enums are a byte.
In the note structure, for each field there is a value that 

Re: [abcusers] ABCp output data structure

2004-09-10 Thread John Walsh
Paul Rosen writes:

elemskip - arbitrary length string.[What does this do?]


Elemskip is the distance between notes, a real number.  It is used by abc2mtex,
but probably not by any other program.  It's good to have the parser accept an 
arbitrary
string, tho, since if the field is eventually re-cycled, it could be used for something
having only text; then there'd be no backward compatibility problem.

The thing that has always puzzled me about ABC is all the header fields. As
far as I can tell, not all programs treat the headers the same, and some
ignore some of them. Is there a recommended place that each of the header
text fields should go?


Yes---elemskip is a good example---all programs but one ignore it. In the 
header
section, only the X: and K: fields have fixed positions. (Of course, it is important
whether the fields occur in the header or inline.) But the order of the fields is
purposely flexible; makes parsing harder, perhaps, but it cuts down enough on the 
errors
in writing tunes to make that worthwhile, especially to musicians. (!)  This goes for a
number of other features of the language, since it's supposed to be both human-readable
and human-writable, as well as machine-readable.  I gather from the comments I read in
these threads that the result is an uncomfortable cross between computer and human
languages, which might be aggravating when you're the one who has to write the parser. 
 
But then, this is yet another reason that a universal parser would be a boon.

There is one major limitation with the data as expressed above: If the point
of the application using it is to modify the file, then comments, line
breaks, and other details are important so that the file looks as much like
the original as possible. In other words, not only should the structure be a
straightforward description of the music, it should have all the information
that is required to write the tune back out identically. For instance, we
should be able to tell between C and 4/4 in the time signature. One way
to handle comments, spaces, and line breaks is to have a second structure
that contained them and instructions for inserting them back where they need
to be. Many programs would ignore that, a transposing program wouldn't.


A good point.  Since the notation is supposed to be human readable, you want to
keep just about everything in place--it's difficult to know beforehand what small 
changes
will confuse a human reader, or, for that matter, for what purposes the parser will be
used.  Secondly, this is a good test of your parser: if you can replace the tune from 
its
representation in the parser, you know that the parser is complete, i.e. it has all the
information it needs.  (In mathematical terms, the mapping abc --- parsed abc is
invertible.)

Cheers,
John Walsh

To subscribe/unsubscribe, point your browser to: http://www.tullochgorm.com/lists.html