On Sat, 2008-03-29 at 12:41 +0100, Frederik Ramm wrote: > Hi, > > > In the file daily-20080326-20080327.osc.bz2 there is this relation: > > > > <relation id="8571" timestamp="2008-03-26T22:05:03Z" user="wiesel111"> > > <tag k="ESCESC" v=""/> > > <tag k="created_by" v="Potlatch 0.8"/> > > <tag k="type" v=""/> > > </relation> > > > > Those are real escapes "\x1d". Fetching via the API doesn't have them, > > the osmosis XML parser is barfing on them. Looks like some mismatch > > between the output and input of osmosis here. > > Seems to be two problems in one, first: how did the key get in there > in the first place, second: why does it not get exported in a way that > Osmosis can read. > > I was hoping to fix the diff by simply running "recode" on it and > instructing it to ignore invalid characters, however I was surprised > to see that recode converted the file from UTF8 ut UTF16 without > complaint (and back again to give an identical file). - Would running > one of the many existing "UTF8 sanitizers" have resolved the problem?
Character 27 is valid UTF-8, but is not valid as content within an XML document: http://www.w3.org/TR/2000/REC-xml-20001006#NT-Char More details and some Java code which might be useful for Osmosis: http://cse-mjmcl.cse.bris.ac.uk/blog/2007/02/14/1171465494443.html I dumped the same data myself with the planet dump tools and it produces the same invalid output. I have added a line into the planet dump code to replace this with a ?. Now that I have found the links above I should perhaps add an even stricter test to drop everything < 32 except for 9, 10 & 13. Jon _______________________________________________ dev mailing list [email protected] http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/dev

