On Fri, Jan 24, 2014 at 2:56 PM, Peter Körner <[email protected]> wrote: > Am 23.01.2014 18:28, schrieb Matt Amos: >> i encourage everyone to take a look and report back any problems you >> find. my thanks to Peter Körner, who seems to already be doing this - >> with no problems? > > No problems yet. Have run two splits of those files already > (http://osm.personalwerk.de/full-history-extracts/) > > What do you think about adopting the osmium-naming-scheme for history files? > .osm.[bz2|gz|pbf] -> regular osm files > .osh.[bz2|gz|pbf] -> history files > .osc.[bz2|gz]-> changeset-files > > That would make detecting the kind of file at first glance more easy and > it also fits nicely int othe .osc-file nameing convention.
personally, i think it's misleading. osmchange is a related, but different, format from osm xml and a parser which works for one will not necessarily work for the other. therefore, having a different extension seems reasonable. in the case of .osm files, they're all potentially history files, and the file format does not change depending on whether multiple versions are present for a single ID or not. whether something is a "history" osm file or a "current" osm file is a matter of the content - so wanting a different extension is a bit like wanting .png for truecolour images and .pgr for greyscale images (in the same PNG format). having said that, it would seem reasonable to add a flag to the document element to indicate whether the .osm file is a special case, having a single version for each ID, as many programs seem to rely on this assumption and it would be better to be able to check it. > I'm going to implement a regular run that generates fresh extracts every > week from the available file. Is there any note on which weekday the > full-history-dumps are generated, so I can loosely sync my split-script > to that rhythm? great! :-) the generation is synced to the backup database dumps, so the clock starts running early Tuesday, when Monday's backup is complete. they seem to be fairly reliably finished by Wednesday morning, so it's probably safe to start looking for them then - although they'll be named for Monday's date. > I xml-writing takes only half as long as xml-reading I'd double-think > about supplying xml-based files. nobody really has fun reading such huge > files with expat. And if it's really neccessary, there's always > osmium_convert which will generate xmls from pbf-dumps or -extracts locally. this is a discussion which could probably continue forever. my opinion is that it's worthwhile distributing files which are sort-of human-readable, in a well-known format/markup for which many libraries exist in many languages, compressed with standard tools, and in the same format as the API. this way, it's possible for people to develop tools which work against small map call downloads, then scale them to extracts and even the whole planet. of course, it's a widely-held belief that xml sucks irretrievably and, while it's certainly true that pbf is smaller and parses faster, distributing only pbf would mean someone would have to learn those extra tools/commands to start using the data. xml, despite its many flaws, at least has myriad libraries, bindings and tools which make it easier to experiment with processing and transforming osm data. these experimental planet/history files are also line-oriented, which means one can even do quick-and-dirty grep/sed/awk work for ad-hoc analysis. cheers, matt _______________________________________________ dev mailing list [email protected] https://lists.openstreetmap.org/listinfo/dev

