On Mon, May 23, 2011 at 04:18:05AM -0500, Scott Crosby wrote: > On Sat, May 21, 2011 at 9:52 AM, Jochen Topf <[email protected]> wrote: > > > > If we use unsigned ints we have some more time. Problematic would only be > > a few cases where negative IDs are currently used (like in JOSM for data > > thats not yet uploaded to the server). But it seems wasteful to me, to go > > to 64bit a year or so earlier than needed to accommodate this case. > > The 64 bit transition is unavoidable. I think this would double the > effort, because we'd all have to go through our software twice, once > to fix signedness bugs, and a second time to go to 64 bits. In > addition, the Java stack couldn't transition to unsigned ints anyways, > as Java lacks unsigned types. An unsigned int transition would be a > 64-bit transition.
First: It has always been clear that sooner or later we will need the 64bit space for OSM IDs. The file formats used for exchanging OSM data already allow them. For XML there is really no limit on the size of the ID and for PBF the IDs are defined as sint64. So we are fine here. But in practice in their software people have often used 32bit IDs instead, because a) currently they are enough and b) they are often more efficient in space and/or time. I think it is up to the implementor of each software to decide what internal representation he uses for IDs. Implementors just have to be aware of all the issue involved. One problem with 64bit IDs is simply that they need twice as much space. If you store a billion node IDs that might be the difference between needing 4GB of RAM or 8GB. So I think it is worth it trying to live with 32bit IDs as long as possible. Hardware is getting cheaper. So preserving 32bit IDs for a year longer might mean investments can be postponed and/or we can actually do things we could not do otherwise, because there is no money for more hardware. The negative IDs throw a bit of a wrench in this whole thing. I can think of only one way to solve this: Define a set of, say 10.000 IDs, for the use cases where negative IDs are currently used. The implementation on the API side would be trivial: Increment the counter in the Postgres that gives out IDs manually and check in the API for that ID range and make sure nobody can write IDs in it. Changing all the software using negative IDs currently would be a bit more difficult. This would give us that extra bit for the price of a few thousand extra bits. And it would be rather ugly. I can't say I really like that idea. So we are probably stuck with the negative IDs. But I could well imagine people writing software that does not work with negative IDs so that they can still work with 32bit IDs a while longer. And while we are at that subject: There is another problem here. Most of the usual GIS software uses 32bit IDs, when using QGIS with Postgres for instance it would not accept a 64bit Postgres ID column. (This might have been fixed in the mean time, I haven't checked for a while.) I have talked about this on several occasions to the people who work on these projects and they all said, they'd work on it. But in the meantime there is an awful lot of software around that can't handle this case. Oh, yes, and shapefiles only allow 32bit IDs. Jochen -- Jochen Topf [email protected] http://www.remote.org/jochen/ +49-721-388298 _______________________________________________ dev mailing list [email protected] http://lists.openstreetmap.org/listinfo/dev

