The subject of this compact overview is complex and complicated stuff. With all details it would be enough for a book. Besides, it comes long before any coding. Also, the subject is probably more in solution architecture than in the application area. Therefor it might be of interest only for a limited forum of developers, especially interested in OSM data based vector mapping model options. At the same time the state of the OSM mapping systems show that the subject is still highly current. With all risks of misunderstandings we will present the basic phases of the model and the major steps/functions in these phases.
-In the first phase we extract 4 data layers from the OSM regular dump. These are in a pure geometry format and the layers/data-sets are: riverbanks, rivers, lakes and river-lines. This is the only place where we use tags and relations. After this phase we prefer the topology and the geometry. -The next phase is related to the redundancy elimination within any of the data layers. The steps/functions we perform are: Elimination of the replicated consecutive nodes and consecutive edges; Elimination of exactly replicated poly-objects (polygons or poly-lines); and Elimination of the almost replicated poly-objects (corridor criterion based). By this phase we remove roughly 9.5% of nodes (or several millions). -Next, we transform any of the four geometries into simple geometries (simple areas and simple polylines). Roughly we perform the following steps: For the area layers we correct/repair any open polygons (out of some then thousand we repair over 90%). If possible, border segments are connected, gaps are closed and polygonal line end-points are snapped or connected. Next, we transform the polygon sets into simple area structures (one outer/container and n>=0 number of inner/hole polygons) separately for any of the area data layers. For the river-lines/poly-lines the simple polylines contain only nodes of order <=2. -In the next phase we create the large areas that cover neighbouring/connected areas from the three area layers. The steps here are: >From the lake areas we extract only those having considerable overlaps with at least one area from the riverbanks or rivers data layers. The rest of the lakes are irrelevant here. Note that there is a large number of overlaps among the elements of the three area layers (any combinations between areas from riverbanks, rivers and lakes). Unfortunately, as a rule, the overlapping areas are with different topologies (structures) and this is another cause of systematic OSM errors. In the next step we merge the three simple area sets into one and from here the area source attribute is ignored. Next, we smash a copy of the simple area set into a (huge) number of none crossing and none overlapping vectors. Out of these, we keep only the real, the pure border vectors (at least one simple area interior on one side and no area interior on the other side). So, we connect the border vectors into border polygons. Finally, the border polygon set is transformed, again, into a set of simple areas (with a correct orientation of the container and hole border polygons). These simple areas are simple only by their structure. Actually, these are huge areas covering large number of the input area fragments that correspond to the natural river systems (like the Danube, the Amazonas, the Mississippi river system, and so on). In this final set of river systems we managed to remove many thousands of systematic logical errors and a huge volume of redundancy. -This final phase is optional. We extract from all river-line poly-lines only those having at least one node common with a river system/area. From any of these we remove the sections being fully inside an area. In this way we end up with a set of river-line segments that have one or two endpoints common with one or two river areas (indicating side-rivers, connectivity or errors). Note that there is large number of cases where a river-line section runs along the same river area (causing virtual islands or thickness). Again, this is a cause of another systematic logical error. Finally, let us mention some arguments – why to make the OSM river systems? -The simple area structures reflect the naturel river systems’ structure in a best way. -It contains very limited amount of redundancy and replications. -The simple area structures of the river systems is a kind of preconditions to the latter efficient processing phases like zoom levels’ generation (generalization by vector smoothing), tiling and so on. -It contains very limited number of systematic errors present in the OSM source data (and present in most of the OSM data based mapping systems). -It is an excellent help to detect gaps still existing in the river data layers and to make different estimates. Just to mention some. Sandor If interested, more details, illustrations and examples can be found in the white paper here: https://docs.google.com/file/d/0B6qGm3k2qWHqSEVObDhscFFSS00/edit?usp=sharing
_______________________________________________ dev mailing list [email protected] https://lists.openstreetmap.org/listinfo/dev

