On Wed, Sep 29, 2010 at 6:47 PM, Stefan de Konink <ste...@konink.de> wrote: > On Thu, 30 Sep 2010, Frederik Ramm wrote: >> Speaking of "polished": The program currently produces invalid XML because >> " and & are not escaped, leading to lines like > > Yes, Roeland pointed that out as well yesterday. We have discussed an escape > table. Maybe first parsing the entire string table, alternatively doing it > for each instance.
In addition to " and &, you need to escape <. planet.c also escapes >. It uses character references for each (", &, <, and >). planet.c also escapes carriage return, line feed, and tab, as , , and 	. AFAICT it is legal to include these unescaped (though it would be nice to escape at least line feeds to make it easier on fast, non-XML-compliant parsers). Now, finally, there are characters in the db which cannot be represented in XML 1.0 (but can be represented in XML 1.1). Most significantly, control characters (ASCII less than 32) other than carriage return, line feed, and tab. Some versions of planet.c convert these into ?. Some versions omit them completely. At least one version converts them into &#ASCII;, where ASCII is the ASCII code. I actually like the last version the best, though it is invalid in XML 1.0 (valid in XML 1.1). Personally I'd recommend producing XML 1.1, at least as an option, in order to include these characters. I don't believe there are any null characters in the database. These could not be represented in XML 1.0 nor XML 1.1. _______________________________________________ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev