Ævar Arnfjörð Bjarmason wrote: > * Potlatch will enter whatever raw binary string the user > supplies into the database that the main API would reject > as an invalid request, hence the corrupt data
Sort of. From a client point of view, the bug you filed is that Linux Flash Player has long been broken beyond belief and doesn't permit non-ASCII characters to be entered into a textfield. (See http://bugs.adobe.com/jira/browse/FP-40 .) This morning is actually a different issue AFAICT. Potlatch (the SWF client) has long used an ActionScript method, textField.restrict, to prevent control characters (0x00-0x1F) being input into textfields. Unfortunately the latest version of Ming (the open-source Flash compiler used to compile Potlatch), 0.4.2, appears to be broken and will not compile textField.restrict correctly - it randomly uppercases character input (letters D to U, IIRC) which is a whole heap of no good for entering tags. (See http://bugs.libming.org/show_bug.cgi?id=88 .) Consequently when I needed to commit a new revision of Potlatch at SOTM, and only had a laptop with 0.4.2 installed, this check was temporarily removed. It'll be back in this evening now I'm back with a machine with Ming 0.3 on it. As I mentioned to you the other day, it would be really useful if some Linux-using OSMers could expand the reports at http://trac.openstreetmap.org/ticket/1936 so we can find exactly _how_ FP for Linux is breaking encoding, and fix it either in Potlatch or at the API. From the two examples you give, for two-byte UTF8, it appears to be adding 0x03 before the first byte and 0x83 0xC2 after it. But we need to work out whether this is a universal pattern for all two-byte UTF8 sequences, and what happens with longer sequences. This should be fairly trivial for someone with the Rails port installed on a Linux machine, I'd hope. > And as has been pointed out there's an ambiguity as to what > sequences of bytes can be written to the database whether that > be full UTF-8 or some XML subset of it. Indeed. cheers Richard -- View this message in context: http://www.nabble.com/broken-utf8-in-minute-changeset-200907140650-tp24475713p24481719.html Sent from the OpenStreetMap - Dev mailing list archive at Nabble.com. _______________________________________________ dev mailing list [email protected] http://lists.openstreetmap.org/listinfo/dev

