On Thu, Aug 20, 2009 at 14:10, Anthony<[email protected]> wrote: > On Thu, Aug 20, 2009 at 1:55 PM, Nathan <[email protected]> wrote: >> >> My point (which might still be incorrect, of course) was that an analysis >> based on 30,000 randomly selected pages was more informative about the >> English Wikipedia than 100 articles about serving United States Senators. > > > Any automated method of finding vandalism is doomed to failure. I'd say its > informativeness was precisely zero. > > Greg's analysis, on the other hand, was informative, but it was targeted at > a much different question than Robert's. > > "if one chooses a random page from Wikipedia right now, what is the > probability of receiving a vandalized revision" The best way to answer that > question would be with a manually processed random sample taken from a > pre-chosen moment in time. As few as 1000 revisions would probably be > sufficient, if I know anything about statistics, but I'll let someone with > more knowledge of statistics verify or refute that. The results will depend > heavily on one's definition of "vandalism", though.
I did this in an informal fashion in 2005 during my "hundred article" surveys. Of the 503 pages I looked at, only one was clearly vandalized the first time I looked at it, so I'd say a thousand samples is probably too small to get any sort of precision on the vandalism rate. -- Mark Wagner [[User:Carnildo]] _______________________________________________ foundation-l mailing list [email protected] Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
