Hi,
I'm working in an approach model the impact of the acceptance of the new
contributor terms on the data. I wanted to kick this around the dev list and
see what people thought of the approach as I get it going...
ASSUMPTIONS:
It should be noted to start with, that edits by bots do not require the
acceptance of the terms and conditions to be kept in the database. Nor do
large batch loads of data from a specific PD source such as the Tiger data.
These sets of data will need to be taken out of the analysis if they are
connected to contributors who have not accepted the new terms and conditions.
Any editor can be classified as:
* Accepting the terms and conditions
* Not having answered yet
* Refusing the new terms and conditions
Database objects have two main sets of properties:
* Geometry: For points this is the lat/long, for ways this is the set of
nodes, for polygons this is the set of nodes and for relations this is the
member sets.
* Attributes (tags)
Objects also have history and each historical change is linked to a user. Each
historical change can impact either the geometry, the attributes or both.
If an editor has refused the terms and conditions (and they are not a bot/batch
load as defined above), we need to take the change sets they have done and
remove the impact of these sets from the data. This means:
* Where geometries have changed:
* If the object has a prior version, take the prior version of the
geometry and then apply subsequent geometry edits past theirs
* If the object does not have a prior version, then it is probably lost
to the db along with its tags.
* Where attributes have changed:
* If the specific tag deleted or changed existed in a prior version, roll
back that tag to the latest prior version (which could mean re-adding deleted
tags) and then roll forward subsequent edits to that tag. Other tags should be
unaffected.
* If the tag was added in this edit, then it is probably lost to the db
This approach seems reasonable and is what I am starting to model, of course
the actual detail of the processing of edits by any editors who do not sign up
for the new terms and conditions needs to be determined by the LWG. I am
happy to help with the implementation of the final logic however.
PROCESS FOR ANALYSIS:
Using the history file from 2010-08-02, with periodic diff files added over
time, and a feed of the userids of those who have accepted the new terms and
conditions (also updated over time) I plan to model and report on the impact on
a regular basis.
The report would be along the lines of a table with one row for each of:
* Points
* Ways/Polygons
* Relations
and for each row/object type show:
* # objects in db as of last update
* # tags in db as of last update
* # objects totally clean (all editors in the history have accepted)
* # objects initial editor accepted
* # objects clean so far (no editors have refused)
* # objects initial editor refused (entire object may be lost)
* # objects with partial data loss (one or more editors have refused, but
not the first one)
* # tags totally clean (all editors in the history have accepted)
* # tags initial editor accepted
* # tags clean so far (no editors have refused)
* # tags initial editor refused (entire tag series may be lost)
* # tags with partial data loss (one or more editors have refused, but not
the first one)
As this goes further along, we can look at what types of data are at risk and
break the object out into countries... Only as needed of course.
This will take some significant work and processing power, so i want to be sure
that the methods and metrics are of use to the community and reflect our intent
as a community. Hence posting it here in Dev...
Looking forward to some constructive feedback.
Best Regards,
Jim
Jim Brown - CTO CloudMade
email: [email protected]<mailto:[email protected]>
skype: jamesbrown_uk
_______________________________________________
dev mailing list
[email protected]
http://lists.openstreetmap.org/listinfo/dev