Robert Rohde wrote: > When one downloads a dump file, what percentage of the pages are > actually in a vandalized state? > > This is equivalent to asking, if one chooses a random page from > Wikipedia right now, what is the probability of receiving a vandalized > revision?
Is there a possibility of re-running the numbers to include traffic weightings? I would hypothesize from experience that if we adjust the "random page" selection to account for traffic (to get a better view of what people are actually seeing) we would see slightly different results. I think we would see a lot less (percentagewise) vandalism that persists for a really long time for precisely the reason you identified: most vandalism that lasts a long time, lasts a long time because it is on obscure pages that no one is visiting. That doesn't mean it is not a problem, but it does change some thinking about what kinds of tools are needed to deal with that problem. I'm not sure what else would change. _______________________________________________ foundation-l mailing list [email protected] Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
