Hi. Reading Adam's Anomaly detection spec and Deric's comments I take the liberty to make also some comments and some suggestions to Adam.
I assume you are using the anomaly notion as a polite synonym for errors. As a political strategy this is fine but we should not close our eyes regarding types and numbers of errors in OSM data. Talking of errors and their number may irritate some people and you may easily be marked as someone advertizing in favor of commercial data sources. I also fully disagree with statements like “…make errors, we don’t mind…”. This might be fine from editors’ point of view (they are rarely in conflict) but some/many errors sometimes may appear as a nightmare for people developing/maintaining mapping systems, navigation systems, LBSs and so on. Besides, the errors (as opposed to correct) are often subjective criteria based. Some events may appear as errors for someone and may just be fine for others. The subject you have selected is essential for the OSM users but too general and abstract. You may create different error classifications like: unintentional errors (the editor is not aware of it) and vandalism (intentional errors to harm), formal (show stopper) and logical (some can live with them) errors and the like. But again, there is no sharp border between similar classes. Note that you even intend to develop an “engine” (a running system) for anomaly (something that differs from normal) detection. I am afraid you are not aware what and how many fine details you are going to meet developing such engine. I don’t want to discourage you, at the end; the decision is up to you and your mentor. Anyway, I would suggest focusing and selecting a more specific and really actual OSM data problem. There are many of them. Just to mention some examples: -Side conflict (or self crossing) of area borders. There are many of them. What more, this event is inevitable in vector smoothing/reduction when creating scale levels for an area object class. Some GIS DB systems do not tolerate such events and stop working or refuses such cases in a control procedure. There are certain solutions to the problem but with side effects and mostly in commercial versions. For example (a rather expensive system) solves this problem by petitioning the area between self crossings into smaller areas. A side effect is brakes on thin long areas like rivers, fiords and so on. You could focus on exchanging (and re-orienting) the border sections between the consecutive self crossing points. This may be a good solution with no side effect. -Detect erroneous roundabouts (RA) in a road/class, for instance in primary roads. There are many, many of them. There are ordinary road sections tagged as RA, RAs tagged as ordinary road sections (or not tagged as RAs), formal errors on RAs, no connections (or disconnected) RAs and ordinary road sections and so on. While in raster mapping many of these errors are hidden in a vector based mapping they cause serious problems. -And just to mention one more case, the river/channel fragmentation problem. Naturally, rivers and channels are creating water-way systems, just like roads and streets. But these are in a highly fragmented format in the source data. What more, there are often missing (disappearing) fragments from version to version. So, how to detect and repair the errors and create a water-way system is still current. You may find even more specific and motivating “anomalies” in some of my OSM data error detection and reparation (internal) notes from some mounts ago. You may download and freely use them from here (for the best view, save the original format, G anti-aliases anti-aliased sections with a poor result): https://docs.google.com/open?id=0B6qGm3k2qWHqeTNYcVVpRy0zSVk https://docs.google.com/open?id=0B6qGm3k2qWHqRlhfSU9MV2YxOHM https://docs.google.com/open?id=0B6qGm3k2qWHqeHg2VEZXQVNNVjg Of course, the errors/anomalies in the source data are just coming on going, but they are constantly there. Therefore, any efforts providing better data quality is of great importance and value for us OSM users. I wish you (and your mentor) great success with your project. Best regards, Sandor.
_______________________________________________ dev mailing list [email protected] http://lists.openstreetmap.org/listinfo/dev

