---------- Forwarded message ---------- From: Reid Priedhorsky <[email protected]> Date: Thu, Aug 20, 2009 at 9:58 AM Subject: Re: [Wiki-research-l] [Foundation-l] How much of Wikipedia is vandalized? 0.4% of Articles To: [email protected]
On 08/20/2009 11:34 AM, Gregory Maxwell wrote: > On Thu, Aug 20, 2009 at 6:06 AM, Robert Rohde<[email protected]> wrote: > [snip] >> When one downloads a dump file, what percentage of the pages are >> actually in a vandalized state? > > Although you don't actually answer that question, you answer a > different question: > > [snip] >> approximations: I considered that "vandalism" is that thing which >> gets reverted, and that "reverts" are those edits tagged with "revert, >> rv, undo, undid, etc." in the edit summary line. Obviously, not all >> vandalism is cleanly reverted, and not all reverts are cleanly tagged. > > Which is interesting too, but part of the problem with calling this a > measure of vandalism is that it isn't really, and we don't really have > a good handle on how solid an approximation it is beyond gut feelings > and arm-waving. We looked into this a couple of years ago and came up with a similar number (though I won't quote it because I don't quite remember what it was), though we estimated the probability that a viewer would encounter a damaged article rather than how many articles were currently damaged. We used the term "damaged" instead of "vandalized" for essentially the reasons you mention (though I confess I didn't fully read your whole letter). Priedhorsky et al., GROUP 2007. Reid ----- I'm forwarding Reid's message from Wikiresearch-l to Foundation-l because, for those interested, it's worth noting that their group looked into this question & published a paper in 2007. Here's the link: http://www.grouplens.org/node/113 -- phoebe _______________________________________________ foundation-l mailing list [email protected] Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
