Hi, people are talking about potential changes to the amount of (personal) data distributed by OSM, in the light of new data protection laws becoming effective in the EU this May. There haven't been any official statements by the OSMF but discussions are going on in the LWG [1].
Even though it is still unclear what the concrete steps will be, I have done some experiments. How well do our existing tools behave if you feed them with OSM data that has less metadata than usual, or no metadata at all? I have set up a test suite which tests Osmium-Tool (which uses the Libosmium library; master branch), Osmosis 0.44.1 and Osmconvert 0.6. The test suite is availabe at https://github.com/geofabrik/metadata-test/ and consists of a Bash script. You need to have osmium, osmosis and osmconvert in your path (or you have to modify the script a bit). The test suite comes with its own hand crafted test data which will be first converted to PBF by Osmium. Afterwards all three tools will prove themselves in the following challenges: - converting XML to PBF - converting PBF to XML - converting XML to XML - applying a diff - deriving changes between two OSM files All challenges are run four times, one iteration with full metadata, one with timestamp and version fields, one with version field only and one without any metadata. Some PBF challenges will also have two variants – one with DenseNodes and one without. The results are files located in the output/ directory. You have to inspect them manually, I have not written a tool to parse them and output how many tests failed. *Results* I compiled the results into a spreadsheet. You can download it at https://github.com/geofabrik/metadata-test/raw/master/table.ods To sum them up: - Osmium is the only programme which passes all format conversion tests. - Osmosis cannot read any XML (OSM and OSC) files without timestamp and version fields. - Osmosis and Osmconvert [2] treat all metadata fields in the DenseInfo message of the PBF format as mandatory. However, the format specification doesn't declare these fields as mandatory. Therefore, they write default values into PBF files if the input lacks these fields: version="-1" timestamp="1969-12-31T23:59:59Z" changeset="-1" (Osmosis [3]), timestamp="1970-01-01T00:00:01Z" changeset="1" version="1" (Osmconvert) This partially applies to the XML output of Osmosis, too. - Deriving a diff file of the changes between two OSM files only works if both files have the same amount of metadata. If one file contains less or more metadata, all objects will appear in the diff file with their new metadata and bloat it up. The question is whether this is the desired behaviour (i.e. the ability to clean a file from metadata using large diffs) or if this behaviour is not desired and the tools generating diffs should compare the tags, location and members of objects which have the same ID but different metadata. - Some tools have bugs which lead to wrong diffs (e.g. missing modifications) if some metadata fields are missing. Best regards Michael [1] https://wiki.osmfoundation.org/wiki/Working_Group_Minutes#Licensing_Working_Group [2] Osmium also had this bug. But it was fixed on the master branch a few days ago. [3] Osmium cannot parse negative version numbers and throws an exception. -- Michael Reichert www.geofabrik.de Geofabrik GmbH Handelsregister: HRB Mannheim 703657 Amalienstr. 44 Geschaeftsfuehrung: C. Karch, F. Ramm 76133 Karlsruhe Tel: 0721-1803560-3 reich...@geofabrik.de Fax: 0721-1803560-9
signature.asc
Description: OpenPGP digital signature
_______________________________________________ dev mailing list dev@openstreetmap.org https://lists.openstreetmap.org/listinfo/dev