On Thu, Oct 29, 2009 at 1:31 PM, Matt Amos <[email protected]> wrote: > On Wed, Oct 28, 2009 at 3:38 PM, Andy Allan <[email protected]> wrote: >> I would say that if the dump code and >> http://www.w3.org/TR/REC-xml/#NT-Char >> are in conflict, there's a bug in the dump code. But since I'm not >> going to fix it, maybe I'll keep my opinions quiet :-) >> >> As for the rails code, there is (AFAIK) no explicit character >> checking. The server implicitly relies on libxml to ensure the >> characters in the XML requests and responses are only those allowed by >> the XML spec above. > > there is explicit checking in the potlatch API, as that doesn't go > through libxml: > > http://trac.openstreetmap.org/browser/sites/rails_port/app/controllers/amf_controller.rb#L909
There doesn't seem to be a spec, so everyone's just making it up as they go along. But I'm going to attempt to clarify, with a quote from W3: "In attribute values, the character information items TAB (#x9), newline (#xA), and carriage-return (#xD) are represented by "	", "
", and "
" respectively." (http://www.w3.org/TR/2000/WD-xml-c14n-20000119.html) Including a tab, newline, or carriage return unescaped in an xml attribute would clearly be incorrect. But as long as it's escaped, it's valid xml. <tag k="name" v="line 1
line 2" /> is valid xml. It may or may not represent valid OSM data. This is why I'm saying my question has nothing at all to do with XML. Apparently under that potlatch code, tabs, carriage returns, and newlines are not allowed in keys or values (I don't actually know ruby/rails enough to say for sure, but that seems to be what Matt just pointed out). On the other hand, usernames apparently *can*, at this point, contain these characters. Actually changing one's username to include them would require using an input method other than the web page, but I don't see any code to forbid this. On the other hand, the planet dump code is silently changing control characters to "?". This could cause problems (for instance, two usernames might wind up being silently changed to identical values), though it would probably require a deliberate attack. I wonder, what happens if someone enters tabs into keys or values through the API (where there apparently are no checks for this), and then someone tries to edit it in potlatch? Looks like a denial of service attack to me. It would be a good idea to release an official spec on exactly what characters are allowed in keys, values, and usernames. Just disallowing control characters (decimal value less than 32) altogether would probably be the best. But if the decision is made to allow them, fine, they need to be handled properly. _______________________________________________ dev mailing list [email protected] http://lists.openstreetmap.org/listinfo/dev

