Hi,

I'm working in an approach model the impact of the acceptance of the new 
contributor terms on the data.  I wanted to kick this around the dev list and 
see what people thought of the approach as I get it going...

ASSUMPTIONS:
It should be noted to start with, that edits by bots do not require the 
acceptance of the terms and conditions to be kept in the database.   Nor do 
large batch loads of data from a specific PD source such as the Tiger data.

These sets of data will need to be taken out of the analysis if they are 
connected to contributors who have not accepted the new terms and conditions.

Any editor can be classified as:

 *   Accepting the terms and conditions
 *   Not having answered yet
 *   Refusing the new terms and conditions


Database objects have two main sets of properties:

 *   Geometry: For points this is the lat/long, for ways this is the set of 
nodes, for polygons this is the set of nodes and for relations this is the 
member sets.
 *   Attributes (tags)


Objects also have history and each historical change is linked to a user.  Each 
historical change can impact either the geometry, the attributes or both.

If an editor has refused the terms and conditions (and they are not a bot/batch 
load as defined above), we need to take the change sets they have done and 
remove the impact of these sets from the data.  This means:

 *   Where geometries have changed:
    *   If the object has a prior version, take the prior version of the 
geometry and then apply subsequent geometry edits past theirs
    *   If the object does not have a prior version, then it is probably lost 
to the db along with its tags.
 *   Where attributes have changed:
    *   If the specific tag deleted or changed existed in a prior version, roll 
back that tag to the latest prior version (which could mean re-adding deleted 
tags) and then roll forward subsequent edits to that tag.  Other tags should be 
unaffected.
    *   If the tag was added in this edit, then it is probably lost to the db


This approach seems reasonable and is what I am starting to model, of course 
the actual detail of the processing of edits by any editors who do not sign up 
for the new terms and conditions needs to be determined by the LWG.   I am 
happy to help with the implementation of the final logic however.

PROCESS FOR ANALYSIS:
Using the history file from 2010-08-02, with periodic diff files added over 
time, and a feed of the userids of those who have accepted the new terms and 
conditions (also updated over time) I plan to model and report on the impact on 
a regular basis.

The report would be along the lines of a table with one row for each of:

 *   Points
 *   Ways/Polygons
 *   Relations


and for each row/object type show:

 *   # objects in db as of last update
 *   # tags in db as of last update
 *   # objects totally clean (all editors in the history have accepted)
 *   # objects initial editor accepted
 *   # objects clean so far (no editors have refused)
 *   # objects initial editor refused (entire object may be lost)
 *   # objects with partial data loss (one or more editors have refused, but 
not the first one)
 *   # tags totally clean (all editors in the history have accepted)
 *   # tags initial editor accepted
 *   # tags clean so far (no editors have refused)
 *   # tags initial editor refused (entire tag series may be lost)
 *   # tags with partial data loss (one or more editors have refused, but not 
the first one)
As this goes further along, we can look at what types of data are at risk and 
break the object out into countries...  Only as needed of course.

This will take some significant work and processing power, so i want to be sure 
that the methods and metrics are of use to the community and reflect our intent 
as a community.  Hence posting it here in Dev...

Looking forward to some constructive feedback.


Best Regards,

Jim

Jim Brown - CTO CloudMade
email:   [email protected]<mailto:[email protected]>
skype:  jamesbrown_uk

_______________________________________________
dev mailing list
[email protected]
http://lists.openstreetmap.org/listinfo/dev

Reply via email to