It is well known that OSM is an excellent research and development arena for a wide range of institutions. Of course, the major effort is done by the huge number of enthusiastic editors and people taking care of the source data services. Many thanks to them. But, the real value of all that effort comes to the scene by work of many developers making OSM data based applications ranging from simple static maps for a small area containing just a few object classes up to the most complex and streaming based mappings for the Planet. I am one of those working on that side, the application side, of OSM.
Doing parametric (vs. raster) data format based mapping implicitly imply certain form for data preparation (anomaly/error detection, reparation, defragmentation, data reduction, format change …). The data preparation is highly application dependent and as a rule based on subjective and heuristic criteria (except some trivial cases). For example, the “8”, the self-crossing area border line is error to some but not to others, the same with a hole/inner area border line touching (or having a common section or even partly being outside) the container/outer border line, partly overlapping complex and complicated areas/multipolygons, lakes in/over lakes, replications, ordinary road sections tagged as roundabouts (or the contrary), line sections with consecutive nodes A,B,A,B,A, almost overlapping areas … just to mention some. Many of these anomalies you never see in raster mappings (blue is blue, brown is brown …). But then, when you do all the intense preparation comes the highly rewarding part of the mapping. You can do the most amazing things with your mapping. This excitement has triggered me to write these bullets. Besides, there were questions like how many points are in the OSM source DB, how many poly-lines, and replications and so on. Let me present some (maybe boring) data from the end of September Planet dump (I do not make data logs and protocols for every preparation). The object classes taken in account are: land/sea (area objects created from the coastline class), lakes, forests, rivers, channels, farms, industry, parks, residential and buildings. Further: motorway, trunk, primary, secondary, tertiary roads, living-street, path, railroad, country border, state border, ferry, tramway, river-lines, channel-lines and streets line-work classes. Point objects are not in focus here. The input/source number of poly-lines 135 657 272 The input/source number of points 1 667 326 987 The number of replicated poly-lines 132 528 The number of replicated points 10 250 786 The number of detected errors 60 468 The number of corrected errors 47 039 Some notes: -When detecting and removing replications the order of procedures is essential. There is a considerable difference whether you first detect and remove poly-line replications and so points or the contrary. Also, after removing all the mentioned replications still may be many replicated poly-lines after linear connections (two different poly-line sets still may create the same poly-line after linear convention). -Some other redundant data is not counted as replication. For example, common border sections between area fragments in the same class. Or, some editors (familiar with the white pixel paradigm on alignments) are making a slight/thin overlap on common borders. -Not all uncorrected erroneous objects are ignored. For example many ordinary road sections tagged as roundabouts are just moved back to the ordinary sections in the same class. -Many errors are hidden in a single class but are present when more classes are simultaneously analyzed. For example, many lakes (or lake sections) are inside/over the water of the land/sea object class (so, careful with only diff based updates) . And so on. The data preparation reduces the data amount typically by 25-30% (before the data scale levels generation). But then, after all that heavy work when the mapping system is ready comes the rewarding part. It is really amazing what you can do and only the fantasy is a limitation: Besides the usual LBSs, navigation/routing, sub-mappings … you can: -With a single click se the huge Amazonas river system blinking (though there are still some gaps on smaller side-rivers). -In the matter of seconds calculate the total length of all Planet coastlines, coastline of a continent, of a huge lake… -In the same way estimate the fresh-water amount on the Planet, estimate the water-surface of the Danube river system … -Simulate how the planet looks like in the night when cities are approximately with the same lightness … -Safety monitoring within huge sea areas to avoid floating-object collisions (of, practically, unlimited number), inside territorial waters (a corridor belt along the coastline), around critical objects (like platforms) and so on. Finally, what is really exciting is that the experience shows that OSM is becoming richer and richer with data and details and the number of irresolvable errors (or those that need manual intervention) is decreasing. Best regards, Sandor. B.
_______________________________________________ dev mailing list [email protected] http://lists.openstreetmap.org/listinfo/dev

