sorry, last mail wrong from header. Joe Austen: > > Am 24.02.2017 um 02:15 schrieb Joseph Austin: > >> This raises another question. I'm working with MIDI files, > >> and it's not clear how to encode UTF-8 text in MIDI. > >> There must be some convention, but I haven't found an official RP for it. ... > I don't have a program that displays MIDI files with lyrics, so I can't test > it.
Timidity will show the lyrics. I have a simple program that dumps the midi as text: http://aspodata.se/git/musik/bin/midi.pl $ midi.pl test.midi | grep lyric | head ['lyric', 0, 'Sta'], ['lyric', 768, 'bat '], ['lyric', 768, 'Ma'], ['lyric', 768, 'ter '], ['lyric', 384, 'do'], ['lyric', 768, 'lo'], ['lyric', 384, 'ro'], ['lyric', 768, 'sa '], ['lyric', 384, 'sa '], ['lyric', 384, 'jux'], $ > It appears that, when generating a MIDI file, LilyPond currently > just puts UTF8 chars in the text fields as if they were ASCII. > According the base MIDI spec, this is illegal; only ASCII chars > between 0 and 127 are allowed. Your wording is too hard. complete_midi_96-1-3.pdf, p.137 (or [1] p.10) clearly says "should", but "other characters codes using the high-order bit may be used for interchange of files between different programs on the same computer which supports an extended character set. Programs on a computer which does not support non-ASCII characters should ignore those characters." [1] http://www.cdik.se/pdf/midiformat.pdf Also, rp17.pdf, last paragraph gives you the set that are "accepted for use" and that "it is best to avoid the use of these characters: \ [ ] { }". And, rp26 clearly states in section 5: In addition, if a byte order mark which specifies UNICODE such as 'FF FE' or 'FE FF' exists, the character code SET should be treated as UNICODE. There is such a "byte order mark" for utf8, see [2]. And then by extension, you just have to insert that BOM somewhere in the midi file (exists == not restricted to the lyrics meta event, preferable in track 0 at time 0) and it would be legal (according to the recommendation) to use utf8 straigth out the box. [2] http://www.unicode.org/faq/utf_bom.html#BOM > However, MIDI RP-17 and RP-26 introduce additional encodings for > the <text> portion of the lyric meta-event FF 05 <len> <text>. You do extrapolate a litte, rp17 tells you the "recommended" way to specify end of word/line/paragraph, and gives you a list of characters that should give no compatibility problems. > In particular, RP-26 specifies the "language" code {@LATIN} to > include 8-bit chars > 127. It seems no code for "UTF8" has been > officially defined, but a reasonable proposal might be language code: > {@UTF8}. You don't need that, see above about BOM. Also it would be interesting to see which programs that actually support rp26. Since midi "standards" just are recommendations, you have to know what works in the wild. .. > So for LilyPond purposes, it would suffice to use a reversible > encoding, that is, LilyPond would accept any MIDI file text format > that LilyPond generates. The apparently existing UTF-8 default > should work for that. Lilypond don't read midi files, you can convert midi files to ly files, which then lilypond can read. > But if we are going to use a "private standard", we might as well > imitate the "official" standard and insert something like > FF 05 07 { @ U T F 8 } > And lobby AMEI/MMA to adopt an official UTF8 position. Could be good, but why just not capitalize on the BOM and just use utf8. Regards, /Karl Hammar ----------------------------------------------------------------------- Aspö Data Lilla Aspö 148 S-742 94 Östhammar Sweden +46 173 140 57 _______________________________________________ lilypond-user mailing list [email protected] https://lists.gnu.org/mailman/listinfo/lilypond-user
