On Sun 14 May 2017 at 15:35:51 (+0100), Phil Holmes wrote: > ----- Original Message ----- From: "Urs Liska" <u...@openlilylib.org> > To: <lilypond-user@gnu.org> > Sent: Sunday, May 14, 2017 3:06 PM > Subject: Re: XML to .ly and Lilypond, again
> >Am 14.05.2017 um 16:03 schrieb Phil Holmes: > >>I've just confirmed Ian Ring's suggestion - removing the copyright > >>symbol allows the conversion to continue, but results in text with > >>spurious null characters. Only some of the text, as I reported in http://lists.gnu.org/archive/html/lilypond-user/2017-05/msg00241.html > >But can that be? Shouldn't MusicXML allow arbitrary regular Unicode > >characters? > > My understanding is the XML is like HTML and requires special > characters to be escaped. No, the norm is for XML to be written in Unicode as this one is, hence its header: <?xml version="1.0" encoding="UTF-8" standalone="no"?> So the program should be handling all the data in unicode, and the problem is the exact opposite of what I started out looking for. Handling unicode is tricky at best in python2, and I avoided it myself by switching to python3 before trying to do anything more than printing unicode to output, which is all musicxml2py should really be doing. However, the new version tries to do one clever thing and it's in split_string_and_preserve_doublequoted_substrings in utilities.py. This uses the shlex module whose preamble runs: The shlex class makes it easy to write lexical analyzers for simple syntaxes resembling that of the Unix shell. This will often be useful for writing minilanguages, (for example, in run control files for Python applications) or for parsing quoted strings. Prior to Python 2.7.3, this module did not support Unicode input. So the fate of the copyright symbol in printer.dump should be to go from u'"\xa9"' ← a unicode value to [u'"\xa9"'] ← a list with one unicode value but instead it gets mangled to ['"\x00\xa9\x00"', '\x00'] ← a list of ascii strings. I don't know what the change was meant to fix as I've never used musicxml in anger. But the easiest patch to get things to work is to replace words = utilities.split_string_and_preserve_doublequoted_substrings (str) with words = string.split (str) in .../lilypond-2.19.…/lilypond/usr/share/lilypond/current/python/utilities.py assuming you're running a downloaded version rather than one included in your distribution. (Debian is still installing 2.18 IIRC.) Cheers, David. _______________________________________________ lilypond-user mailing list lilypond-user@gnu.org https://lists.gnu.org/mailman/listinfo/lilypond-user