Hi All,
Because I am getting more and more disappointed with the current state of affairs with respect to the downloading of OSM content some people on the Dutch OSM IRC channel thought of an alternative way of distribution that could potentionally get binary diffs after any possible download in the past. I wrote the first implementation of it in the last couple of hours and tested it on the Dutch dataset. The current gzip compressed data is about 135MB. Extracted it represents 1.4GB of XML. The binary file is completely analogue to the XML, no shortcuts what so ever. The first reduction to binary format containing only data reduced the set to 418MB and allows a bzip2 compression to 78MB. In principle it is nothing more than: N [long id] [float lat] [float lon] [time_t timestamp] [uint length of userfield] [non terminated userfield] And likewise for the other subtries. As discussed before; it is possible to do a second pass binary encoding with all strings in a distinct table. Where the linked list can be recovered to an array can be recovered from the storage. This would make a significance difference for the tag keys alone. In this case all string fields can converted to unsigned long fields for now 4G of distinct fields seems enough :) If interested taking a peak is possible at; http://repo.or.cz/w/handlerosm.git?a=tree;f=osmbinary;h=1701a9194285a56e7a91536def314fb8b2e95350;hb=96c7b81af692df89bc6c5eba999e9bb61c92323c Stefan _______________________________________________ dev mailing list [email protected] http://lists.openstreetmap.org/listinfo/dev

