Hi, Am 2. April 2012 22:20 schrieb Paul Norman <[email protected]>:
> A tool that operates on the changeset level is > https://github.com/pnorman/osm-weirdness**** > > It detects changesets that have a high probability of being an import or > mechanical edit. The detection is pretty crude but it does find a fair > number of undocumented imports, mechanical edits, and other weirdness. If > you point it an old state.txt file it will start in the past and work up to > the present. > I've a look later this day on your script. > ** > > When working with the minutely diffs there are some limitations:**** > > Limited knowledge of changesets. In practice, if you start your detection > an hour in the past you can have a list of all open changesets, but it is > not possible to know the tags of the changesets.**** > > No knowledge of the previous state of objects. You know where deleted > objects were, but you can’t tell how far an object is moved or what it’s > tags were before. To tell this you need to query a service with a full > history DB, and handling full history files is difficult.**** > > No knowledge of way geometry if using existing nodes. Iandees’ > https://github.com/pnorman/osm-weirdness/tree/way_check solves this by > fetching nodes in a way that aren’t also in the changeset from jxapi and it > can then detect bad geometry (e.g. ways that trace over themselves)**** > > ** ** > > If you were to code a vandalism detection tool I think it should work on > the minutely replication diffs ( > http://wiki.openstreetmap.org/wiki/Planet.osm/diffs) > I thought about analyse the data after the changeset is closed, but this diffs sounds also good. I will check this way :) Thanks! Am 3. April 2012 09:38 schrieb Derick Rethans <[email protected]>: > On Mon, 2 Apr 2012, kabum wrote: > > > Result: > > - each changeset has a total rating -> use a treshold value to divide > them > > into suspicious and not suspicious > > Instead of just using static thresholds, I think that something like SVM > (http://en.wikipedia.org/wiki/Support_vector_machine) might be highly > benificial here; and it's another cool technology to play with. There is > a cool library for this (http://www.csie.ntu.edu.tw/~cjlin/libsvm/) and > I know there is at least an extension to use it from PHP: > http://phpir.com/support-vector-machines-in-php Thanks for this method ... seems to be very suitable for our use case. I've already some years of experience of PHP, but I wouldn't prefer it for this part of the project. I thought about Python (libsvm has native Python bindings ;)) > > > Some questions came up within this preparation: > > - Is there a prefered language? Has this to be specified within the > > proposal? (language skill has to be rated, so I would decide this during > > the project phase) > > Not really any preferred language. What did you have in mind? For the > front end I was thinking PHP, but the engine, I wouldn't know. I think > something high performant (so C or C++) might be benificial. > My thoughts were that it's easy to setup and it's capable to call it easy from a terminal or to include it in other python scripts (i.e. web frontend). If C++ is necessary, because of it's speed, then I think I could master this. In the passed semester I participated in a software engineering partical training at university (in a team of five fellow students), where we have an extensive use of C++ (https://github.com/brainafk/Empire). > > > - I also would like to discuss used libraries and framework within the > > project phase, or should I decide this also in my proposal? > > - Should the frontend integrate in the current website (ruby on rails > > project) or should this just be an optional feature? > > I think it can easily live as it's own website. > Ok :) > > > - How detailed should be the proposal? Is it enough to formulate this > draft? > > That's a tricky one, the more information you provide the better I > think, as it shows you have thought about it :-) > I think it grows a lot by this discussion and I try to be as detailed as possible. :) Thanks for the response :) Regards, Morris
_______________________________________________ dev mailing list [email protected] http://lists.openstreetmap.org/listinfo/dev

