[OSM-talk] A forest ... what?

2017-04-10 Thread Sandor Seres
Three weeks ago I posted some multipolygon related notes. This mail is, in a
way, an addition to that former mail.

My first note was triggered by some user worries about poorer maps if they
use data from the osm2ogsql preparation. Dropping "broken multipolygons"
will result in many and large empty/white places with long reparation
period. Strengthening the preparation on the subject might be a better
option in my opinion (I know, I was there). However, at the end, how this
subject will be handled is perfectly up to the authors of the osm2pgsql
application.  Users starting from the OSM source data will not be affected
whatever strategy will be selected.

The second note was related to the mass/programmatic correction of the
source data. This could have dangerous/damaging impact on many OSM users.
Fortunately, the replays say that programmatic correction is not a strategy
in the "fixing multypolygons" actions. I have mentioned the "self-crossings"
issue which is not an error for many users (depending on what notion
interpretations and tools one uses). To clean up the confusion, this note
needs some additional words. Assume someone would correct all polygon
self-crossings in the source data. Assume, the selected fixing model is the
popular dividing model (the polygon is divided into new polygons between
self-crossings). The "fix" will be correct but the consequences damaging.
Namely, in scaling and rendering the new small areas quickly  reach
ignorable/collapsing size causing brakes. Here, it is worth noting, that the
self-crossing issue is a topic in the modern vector based digital mapping
even if all self-crossings are somehow resolved in the source data. Namely,
while scaling and doing edge-smoothing in data generalisation,
self-crossings on thin area sections (like fiords, peninsulas, rivers and so
on) are unavoidable and dividing produces many tiny areas. High
fragmentation of the source data and freedom of tag selection (river
sections tagged as lakes) make the issue even worse. Just look  at the
Amazonas river-system rendering from a popular vector map-maker her
 http://goo.gl/bT1Bu9

(the screen dump is from yesterday, from a demo system, in roughly 1:6.7
mill scale). There are really many and large unacceptable breaks. However,
from the same data source, using topology geometry as suggested in my former
mail, it is possible to create a compact minimal coverage for the same river
system like this    https://goo.gl/pNQwDm . Note that
the river system her is one simple area (one outer and many inner borders
never touching each other) from Peru to the Atlantic. To be on the fair side
the last image should be rendered from a zoom/scale level that corresponds
to the 1:6.7 mill scale. This is done here  
https://goo.gl/eaAWNy and the zoom level contains approximately 250 times
less nodes than the level used for the previous image. The area connectivity
is still perfectly preserved and the image is much cleaner in this scale
extract. Finally, if a user is still insist on fixing the polygon
self-crossings, exchanging  and reversing the poly-lines between two
consecutive self-crossings (eventually just reversing the end loop after a
self-crossing) should be a much better strategy. 

However, the third, the last note was my major point. Just to remind. There
is a large set of area related anomalies caused by relations between objects
from different classes (between seas, forests, lakes, rivers.). The extent
and complexity of this set is far beyond the "broken polygons" issue  and
should be more in the development focus. Even if the areas/multipolygons
within a class are in perfect conformity with the strongest OSM and OGC
rules, still these anomalies are there, though sometimes hardly visible in
maps. Therefor many map-makers tolerate them but in GIS systems they appear
as strong limitations and should not be tolerated. In the former mail  I
have presented many examples and some hints how these anomalies could be
resolved. Unfortunately, the discussion went in a wrong direction, about the
Scandinavian forests, while the region selection is irrelevant for the
subject. To avoid much repetition I will present further examples without
details in procedures. The illustrations are from the area of Japan (one of
the best mapped areas) and the source is the standard OSM dump from some
week ago.

Honestly, I am not sure what a forest is. More precisely, if you ask me - I
know, if you ask me to tell what it is - I do not know. However, among the
many interpretations, I am closest to accept the topology interpretation of
the notion. The green area in the front page map (or in other OSM based
maps) usually covering the areas tagged as forest and/or wood. In Japan, as
everywhere, forests are uploaded highly fragmented, they overlap in the most
strange combinations, the same with river and lake area objects. The most
common case is when borders of 

[OSM-talk] Fixing broken multipolygons, some notes

2017-03-18 Thread Sandor Seres
I am new to this list and therefore apologize for eventual
misinterpretations and wrong stile. The motivation for the mail is a
worrying mail on the local list about the purer osm2pgsql based maps and
about the "broken polygons" fixing strategies. The mentioned white spots in
the Scandinavian forests are just an illustration. By simply dropping broken
polygons, empty spots will be typical for any area types and for any corners
of the Planet. 

As I understand, osm2pgsql is an application doing data preparation from the
OSM source data up to a DB used by many mapmakers for rendering. We can see
that almost all OSM based public mapping system use this database and
consequently repeat the same anomalies. Therefore, maybe, making the
osm2pgsql more robust could be a better strategy. There is still a large
potential for such strengthening. Just waiting for "do-ocracy" reparations
is really a long-term strategy. Anyway, users starting from the source OSM
data will not be affected by any of these strategies.

The "Fixing broken polygons", especially programmatic/mass fixing, could be
more dangerous to all users. Just look at the many possible self-crossing
fixing options. Loosely defined notions open for different interpretations
and different sets of error criteria. Consequently, for the same object type
we may have (and we do) different error classes and reparation tools.
Besides the typical polygon interpretations as area (ESRI polygon
redefinition) or as a closed polygonal line, we simply can't find in the
documentation what "outer", "inner", "hole" . notions actually mean. The
interpretation (individual perception) of these notions is left to us and
there we have a source of misunderstandings. For instance, if we assume that
"outer" border polygons define the interior candidate points (points inside
and on the border) and "inner" border polygons define (in the same way)
exterior points of area than self-crossings, touching polygons, polygon
overlaps, crossings. are not errors at all. 

However, my point here is still something else. The "broken multipolygon"
(whatever that means) issue is just "the tip of the iceberg". There is still
remaining huge number of anomalies caused by area object relations from
different area classes. I intentionally use the anomaly notion, as a
moderate form for error, because many people/mapmakers may liv with them.
But a modern GIS system and a vector layers based digital cartography cannot
tolerate them. Let me present some arguments and illustrations. Let us look
at a map extract from the mentioned Scandinavian forests here
http://osm.org/go/0Tt1PZIt- . The example could be taken from any corner of
the Planet and, as mentioned, there is huge number of similar cases. At the
first glance, everything looks correct and nice (and it is). However, we see
immediately that something is still wrong. The forest type symbols are
placed directly over the water. In another style, typical land related names
are on the water like here http://osm.org/go/0Tt1PZIt-?layers=T . Looking at
the source data we can see that the lake in the middle is placed over an
empty space (intentionally, not a hole) where the border of the lake runs
slightly in and outside the forests. At the same time, we can see many
forest areas inside the mentioned empty space overwritten with the lake that
has no holes. Consequently, there are many missing islands in the lake and
many missing forest areas in the extract. Note that only on that little
extract there are more than 40 of the described anomalies. What more, there
are many lakes with borders running in/out of forest areas (corridor border
overlaps), having considerable parts over a forest and holes in forests,
partly overlapping several disjunctive forest areas and so on, and the
contrary. Extending the case to the Planet and other area types combinations
we may feel the extent of the issue. There were attempts to compensate these
problems in renderings like rendering the holes, rendering smaller over
larger objects and so on. These actions generally do not work. Simply, they
do good some places and damaging at other places. So, the question is
whether and what can we do with the problem. Just waiting for do-ocracy
based reparations is, obviously, irrational. Fortunately, the source data
has a large potential to remove most of the mentioned anomalies. Let me
present some hints in bullets for the forests, lakes and river combinations.

Assume {F0} is a set of all forest outer border polygons (closed polygonal
lines) and {F1,L0,R0} is a set of all inner forest, outer lake and outer
river border polygons (the orientations and the relations are irrelevant).
Then, you can prove the existence of minimal disjunctive simple area
coverage of the forests. In other words, you can find a set of isolated
simple areas (one outer and zero or any number of inner polygons) where any
area point is on/inside of at least one element in {F0} and never on/ inside
of any element in