-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Guilherme Salgado wrote: > Hi John, > > I've used meliae to get a memory dump from Launchpad, but when I tried > to load that dump I got http://paste.ubuntu.com/397273/ (the first line > there shows the line that causes simplejson.loads() to choke). > > From my understanding of [1], this seems to be expected, but I wonder > how these unpaired surrogates ended up in the dump. Any ideas? > > BTW, I did some hacks in my local copy of meliae to replace the > problematic bits on that line, and after that I was able to load the > dump. Maybe with that I could try and find out where the unpaired > surrogates are coming from? > > [1] <http://en.wikipedia.org/wiki/Mapping_of_Unicode_characters#Surrogates> > > Cheers, >
I'm mostly offline on vacation right now, but I'll try to help out when I get back. I can think of 2 causes: 1) I trim most output to 100 characters. (So if you have a 1,000 byte string, I only output 100 bytes.) It is possible that a Unicode surrogate was at bytes 100 and 101 and just got truncated. 2) I use a pretty stupid method for encoding 8-bit strings, just mapping them all to the unicode code point '\xff' => U+00FF. Some of that may be invalid. 3) Other bugs I don't even know about... :) I'm happy to debug this with you sometimes soon. (If you're getting this, it probably means I'm back home, rather than offline in an airport.) John =:-> -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Cygwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkukX9sACgkQJdeBCYSNAANWfwCgw2CBP2rdIwUEGwNK9yE70sIY LqoAn2J14Q84GDZEBLPDlqBZjol6iVzn =MvTl -----END PGP SIGNATURE----- _______________________________________________ Mailing list: https://launchpad.net/~launchpad-dev Post to : [email protected] Unsubscribe : https://launchpad.net/~launchpad-dev More help : https://help.launchpad.net/ListHelp

