On Mon, Dec 21, 2009 at 6:44 AM, Jon Burgess <jburgess...@googlemail.com>wrote:

> On Mon, 2009-12-21 at 01:08 -0500, Anthony wrote:
> > Cool.  If anyone familiar with the planet dumper tool is listening...
> >
> > In
> >
> http://svn.openstreetmap.org/applications/utils/planet.osm/C/output_osm.c
> >
> > } else if ((*in >= 0) && (*in < 32)) {
> >             escape_tmp[len] = '?';
> >             len++;
> >
> > should be something like
> >
> > } else if ((*in > 0) && (*in < 32)) {
> >             len+=sprintf(&escape_tmp[len], "&#%d;", *in);
> >
> > "Something like" as in I haven't even checked if that compiles :).
>
> Most of the control characters are not allowed in a valid XML file. It
> makes no difference whether they are present as an ASCII character or as
> the equivalent entity.
>

Ah yes.  Hmm.  That said, most of the characters actually in the database
are carriage returns, which along with tabs and line feeds (also in the db)
are valid in XML.  Other characters are present - for instance ASCII 3 in
http://www.openstreetmap.org/browse/changeset/1325382 - those will be more
of a problem.

Hopefully the database can be cleaned of the rest of the characters, because
I'd imagine each dumper is going to have a slightly different way of dealing
with them.  Until that's done, I guess there's no right answer.


> > Of course, another thing to consider is that 1024 bytes isn't enough
> > for the truly pathological cases.  I think you need like 1531 or
> > something to handle that.  Fixing this might be enough to properly
> > process the current db, though.
>
> How do you arrive at the 1531 number?
>

strlen("&quot;")*255+1

Not sure if that's the absolute longest encoded string.  But 255 quotes
makes a valid key/value, and the planet dumper would truncate it, right?

> Any chance of adding num_changes?
>
> The current output reflects the same information as the /changeset API
> call. Do you think it should be there too?
>

Not as a bug, but as a feature request, I guess so.  It's more useful in the
dumps than the API (you can use it to make sure you've got everything
downloaded), but it'd be useful in the API as well, I suppose.  It seems to
be in the DB, so there shouldn't be a performance impact, right?

I see it's mentioned on http://wiki.openstreetmap.org/wiki/.osm
_______________________________________________
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev

Reply via email to