On Thu, Oct 29, 2009 at 6:16 PM, Anthony <[email protected]> wrote: > On Thu, Oct 29, 2009 at 1:31 PM, Matt Amos <[email protected]> wrote: >> On Wed, Oct 28, 2009 at 3:38 PM, Andy Allan <[email protected]> wrote: >>> I would say that if the dump code and >>> http://www.w3.org/TR/REC-xml/#NT-Char >>> are in conflict, there's a bug in the dump code. But since I'm not >>> going to fix it, maybe I'll keep my opinions quiet :-) >>> >>> As for the rails code, there is (AFAIK) no explicit character >>> checking. The server implicitly relies on libxml to ensure the >>> characters in the XML requests and responses are only those allowed by >>> the XML spec above. >> >> there is explicit checking in the potlatch API, as that doesn't go >> through libxml: >> >> http://trac.openstreetmap.org/browser/sites/rails_port/app/controllers/amf_controller.rb#L909 > > There doesn't seem to be a spec, so everyone's just making it up as > they go along.
sort of. the "spec" is just that the API talks XML, and we aim not to have any restrictions beyond that. so anything that's a valid XML character (http://www.w3.org/TR/REC-xml/#charsets) should be allowed in OSM. > But I'm going to attempt to clarify, with a quote from W3: "In > attribute values, the character information items TAB (#x9), newline > (#xA), and carriage-return (#xD) are represented by "	", "
", > and "
" respectively." > (http://www.w3.org/TR/2000/WD-xml-c14n-20000119.html) > > Including a tab, newline, or carriage return unescaped in an xml > attribute would clearly be incorrect. But as long as it's escaped, > it's valid xml. <tag k="name" v="line 1
line 2" /> is valid xml. > It may or may not represent valid OSM data. This is why I'm saying my > question has nothing at all to do with XML. but there are other escaped values which are invalid XML (e.g: �). > Apparently under that potlatch code, tabs, carriage returns, and > newlines are not allowed in keys or values (I don't actually know > ruby/rails enough to say for sure, but that seems to be what Matt just > pointed out). the code, which is a bit opaque: delete "\000-\037", "^\011\012\015" says that it'll delete any char in the range 0-37 (octal), but not 11, 12 or 15. in ascii this corresponds to anything "less than" a space, but not tab, newline or carriage return. it should be (modulo bugs) the same as the XML valid character productions. > On the other hand, usernames apparently *can*, at this > point, contain these characters. Actually changing one's username to > include them would require using an input method other than the web > page, but I don't see any code to forbid this. yes, we should probably add a rails validation to stop people using those chars in their username (who needs a tab in their username anyway?) > On the other hand, the planet dump code is silently changing control > characters to "?". This could cause problems (for instance, two > usernames might wind up being silently changed to identical values), > though it would probably require a deliberate attack. the planet dump format includes the uids, so this case would be detectable. however, this behaviour in planet dump is a bug and i'll have a go at fixing it. > I wonder, what happens if someone enters tabs into keys or values > through the API (where there apparently are no checks for this), and > then someone tries to edit it in potlatch? Looks like a denial of > service attack to me. there don't need to be any checks - the xml is parsed using an xml parser, which would barf if those chars were in the API calls. potlatch also can't enter those chars via its API. i don't think there's any attacks possible here, but i'd like to know if there are! > It would be a good idea to release an official spec on exactly what > characters are allowed in keys, values, and usernames. Just > disallowing control characters (decimal value less than 32) altogether > would probably be the best. But if the decision is made to allow > them, fine, they need to be handled properly. that's a good idea. i'll stick something up on the wiki - for reference i think the current "spec" is the XML valid character productions, although i can't think of any particular reason to keep \t, \n or \r: Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] http://www.w3.org/TR/REC-xml/#NT-Char cheers, matt _______________________________________________ dev mailing list [email protected] http://lists.openstreetmap.org/listinfo/dev

