Hi,

I have finalized a script that can analyze an object's history and determine if certain edits are "non-edits" (i.e. nothing of note was changed at all), or "harmess" (i.e. the object was changed and might have to be rolled back if the contributor does not agree to the license change, but the rollback will likely not affect the quality much).

The idea behind this is to provide some help in prioritizing the re-mapping effort. If someone who doesn't agree to the contributor terms has made an important contribution then we want to re-map that soon; in places where the same guy has just removed a few created_by tags we can ignore that for now.

My analysis does not mean that something I classify as "harmless" will not be reverted when the license change comes; it might well be. But if it gets reverted, the consequences will be neglectable.

What I'm doing is basically look at the object history, identify each contributor, and find out:

* have they made at least one "normal" contribution to the object - added a node to a way, added or changed a tag, moved a node by more than one metre?

* if not, have they made at least one "harmless" contribution - removed a tag, a node, or a member; moved a node by less than one metre?

* if not, then they are a "zero contributor" to that object.

We do indeed have a number of "zero contributors", from times where different editors had different malfunctions - e.g. for a while, if you did a "select all" in JOSM then removed a tag, all objects would be marked as changed even if they did not contain the tag, and you would appera in the object's edit history even though you never changed it. Or Potlatch at some time used to mark a ways' member nodes as changed when you changed the way.

(If an object is reverted to an earlier state, then all intermediate edits count as "zero contributions" as well - they might have been valuable but they are not part of the visible object any more.)

You can try out my script here, by adding a way/node/relation id to the URL like so:

http://wtfe.gryph.de/harmless/way/40103577

The output is a break-down of what my script thinks has happened to the object, and which edits are zero-edits ("severity: 0") or harmless ("severity: 1"). After the version analysis, it summarizes the user contributions - each user is afforded the highest severity of all his changes.

The most important output of my script is if it finds that an object that currently looks "tainted" because someone who does not agree to the license change has touched it, is not really problematic at all because the change in question was harmless.

This is the case in the above "way 40103577" example. The version history contains an edit by non-agreeing user 263596, therefore the whole object looks problematic. My script finds out that this edit is simply a tag deletion, and because all other edits are by people who have agreed to the license change, the object does not have to be a top priority for remapping.

Everyone is invited to play with this script and see what happens. I plan to make this the basis of the v2 WTFE service, meaning that in the future editors will likely *not* highlight stuff that my script deems harmless.

Here's the - hacky, perly - source code: http://wtfe.gryph.de/harmless.pl

Please don't do mass evaluations with this web service, as it runs a "history" query against the API in the backend and this is quite costly. If you want to run this on a large area, download the .pl file and make yourself a full history extract with Peter Koerner's history splitter, then run the perl script on the XML. It can process anything up to the complete planet if you have the patience.

Bye
Frederik

--
Frederik Ramm  ##  eMail [email protected]  ##  N49°00'09" E008°23'33"

_______________________________________________
dev mailing list
[email protected]
http://lists.openstreetmap.org/listinfo/dev

Reply via email to