> I'm wondering if anyone out there could talk about how resource-intensive it > would be to set up a system to show the usage of any arbitrary tag/value > pair (or maybe just the top 5000?) over the course of time. Presumably it > would be something like what tagstat does, but the results would be saved > every week so that graphs could be made. > My thought is that maybe something like this could be used to spot bot > vandalism. Also, it might be helpful to know if a particular set of tags is > falling into disuse or is gaining in popularity. > And it would look cool.
I'm doing that for the new OSMdoc version (I know I've talked a lot about it and got nothing new to show). I'm doing a snapshot daily and I'm using the historydump. I am using the Hadoop stack for this (Hadoop, Hive, HBase, ...) and it takes two to three servers to run efficiently. Unfortunately I have no way to host this stuff so I just run it at home from time to time. But this is very elaborate. The same should be possible on a smaller scale and in a regular PostgreSQL database with some processing of the historydump and/or daily diffs. I too am thinking that this would be great and that some machine learning algorithms would be nice to try on that data set. Classifying changesets as spam and such. Cheers, Lars _______________________________________________ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev