One point that came up in the PBF discussion was the questions of ordering OSM files. Lets try to analyze that and try to separate the different issues:
1. Order of object types OSM files contain three different kinds of objects: nodes, ways, and relations. In the future new types of objects might be added, such as a native area type. Not all files contain all object types, for instance relations can be missing if there are no relations in a bbox. It makes processing easier if all objects of one type are bunched together. It doesn't have to be this way, but it is almost always the case currently. Note that the PBF format is optimized for that case with its structure of blobs. Bunching objects of the same type together is probably always better for applications reading the data. For some applications it might not matter, but for others it makes processing a lot easier (because you can be sure you have read all data of a specific type before you start processing a different type). On the other hand writing data might be much easier when you don't have to take this into account. You can, for instance, basically just concatenate two files to get a new OSM file. I think the use cases where you want to rely on the ordering are important enough to warrant some kind of flag that promises you that the objects are bunched together according to object type. The next question is what the order of the object types should be. Currently almost all OSM files have first all nodes, then all ways, then all relations. That order makes sense, for instance, if you want to be able to easily cut a bbox from the file. If you wanted to assemble multipolygons, a different order would be better (relations, ways, nodes or relations, nodes, ways). I am not sure we can ever decide on one particular "best" order here. But maybe it would make sense to allow different orders and note them down in the header. This way a program could decide in the beginning how many passes it needs over the data or maybe even use a different algorithm depending on the order. 2. Order of objects There is another order issue. Objects themselves can be ordered or not. Currently most OSM files have the objects (nodes, ways, and relations) each ordered according to ID. I guess that most programs do not depend on that order, but some might. And there might be some optimizations that can be done if the order is known, for instance you could stop reading a file if you see IDs larger than the largest ID you are interested in for some reason. Other orderings instead of ID could be possible, for instance by timestamp. I suggest we add an optional flag to the file saying "objects are ordered according to ID". If we have those additional flags an application can directly see whether it can read a file (efficiently). Most applications will probably work with any order, but in case an application can't work without that order it can immediately give an error message and stop. The user can then use Osmosis or some other program that can order the file in the necessary order and re-run the application. To optimize the case where the order is wrong for an application a bit further I think we also want the flags outside the compressed blobs to tell us whats inside. I agree that thats a cleaner solution than storing the offset of the first blobs with certain object types somewhere. Another option would be a special block that says: The following blocks are all ob object type X until you get some special end block (or the start block for a different object type). This would reduce the overhead on each block slightly for the price of having an additional block type. I am not sure which solution is best. Jochen -- Jochen Topf [email protected] http://www.remote.org/jochen/ +49-721-388298 _______________________________________________ dev mailing list [email protected] http://lists.openstreetmap.org/listinfo/dev

