Re: [Foundation-l] RevisionRank: automatically finding out high-quality revisions of an article
On 20 December 2011 01:16, Tom Morris t...@tommorris.org wrote: Under your metric, in this scenario, the edits of a sysop and an experienced user, or later the WikiProject editors, would not be chosen as the high-quality stable version. Yao did in fact mention that other factors would need consideration. And being able to pick a hole doesn't make the algorithm useless - Google certainly went past simple page rank very early on. The question is if Yao's algorithm has markedly better results than just picking the latest. This would warrant investigation, at the least. - d. ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] RevisionRank: automatically finding out high-quality revisions of an article
On 12/19/2011 11:38 PM, Yao Ziyuan wrote: Hi Wikipedians, I seem to have found a way to automatically judge which revision of a Wikipedia article has the best quality. It's very simple: look at that article's edit history and find out, within a specified time range (e.g. the past 6 months), which revision remained unchallenged for the longest time until the next revision occurred. Hey Ziyuan, that's great! Have you made a statistical analysis whether the average revision that remained unchallanged by a long time is better than the average other revisions? It would seem to me (as it seems to Tom), that often that's a false presumption, though that's probably based on guesses and anecdotal experience. Regards, Tobias signature.asc Description: OpenPGP digital signature ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
[Foundation-l] RevisionRank: automatically finding out high-quality revisions of an article
-- On Tue, Dec 20, 2011 at 6:38 AM, Yao Ziyuan yaoziy...@gmail.com wrote: Hi Wikipedians, I seem to have found a way to automatically judge which revision of a Wikipedia article has the best quality. It's very simple: look at that article's edit history and find out, within a specified time range (e.g. the past 6 months), which revision remained unchallenged for the longest time until the next revision occurred. Of course there can be additional factors to refine this, such as also considering each revision's author's reputation (Wikipedia has a reputation system for Wikipedians), but I still feel the above idea is the simplest and most elegant, just like the original PageRank idea is for Google. Best Regards, Ziyuan Yao -- Message: 11 Date: Tue, 20 Dec 2011 01:16:15 + From: Tom Morris t...@tommorris.org Subject: Re: [Foundation-l] RevisionRank: automatically finding out high-quality revisions of an article To: Wikimedia Foundation Mailing List foundation-l@lists.wikimedia.org Message-ID: CAAQB2S9TEQhuiaD4Gb0ZjV-tVSLRgmmjHjbHJwAww=8uift...@mail.gmail.com Content-Type: text/plain; charset=ISO-8859-1 On Mon, Dec 19, 2011 at 22:38, Yao Ziyuan yaoziy...@gmail.com wrote: I seem to have found a way to automatically judge which revision of a Wikipedia article has the best quality. It's very simple: look at that article's edit history and find out, within a specified time range (e.g. the past 6 months), which revision remained unchallenged for the longest time until the next revision occurred. Of course there can be additional factors to refine this, such as also considering each revision's author's reputation (Wikipedia has a reputation system for Wikipedians), but I still feel the above idea is the simplest and most elegant, just like the original PageRank idea is for Google. Okay, how about this. I find a page today that has had only one edit in the past year. That edit was an IP editor changing the page to insert the image of a man sticking his genitalia into a bowl of warm pasta (I haven't checked Wikimedia Commons but would not be surprised...). Nobody notices the change until I come along and undo it. I then see that it is a topic that interests both myself and a friend of mine, and we collaborate on improving the article together: he writes the prose and I dig out obscure references from academic databases. Between us, we edit the page four or five times a day, every day for a week improving the article until it reaches GA status. Having nominated it for GA, a WikiProject picks up on the importance of the topic and a whole swarm of editors interested in the topic swoop in and keep editing it collaboratively for months on end. Under your metric, in this scenario, the edits of a sysop and an experienced user, or later the WikiProject editors, would not be chosen as the high-quality stable version. As for author reputation, check out the WikiTrust extension for Firefox - see http://www.wikitrust.net/ -- Tom Morris http://tommorris.org/ Hi Ziyuan Yao, that is an interesting idea, but not necessarily something that one should do automatically. I recently found an article that since 2006 had been telling the world where the Holy Grail had been from the closure of Glastonbury monastery until the start of the twentieth century. It will take some years of non-editing for the new version of that article to become the stable one. Also some of the articles that our readers are most interested in would look a tad dated. Sarah Palin's article may no longer be at the 25 edits per minute stage that it peaked at, but how many years will it be before it becomes as stable as it was a week before she became John McCain's running mate? Of course the edit history is out there so the earlier versions are available under the same license as the current version. So any enterprising mirror could adopt a system like this if they thought it would look at least as good as the current Wikipedia. As far as I know no-one has yet, and I suspect if they did they'd have legal problems re libellous statements about living people. Wikipedia at least has the moral and I hope legal defence that when we learn of an error we fix it. This sort of system would be automatically displaying an earlier version despite knowing that in many cases it would be displaying false and damaging information. WSC WSC ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] RevisionRank: automatically finding out high-quality revisions of an article
On Tue, Dec 20, 2011 at 6:51 PM, Tobias church.of.emacs...@googlemail.comwrote: On 12/19/2011 11:38 PM, Yao Ziyuan wrote: Hi Wikipedians, I seem to have found a way to automatically judge which revision of a Wikipedia article has the best quality. It's very simple: look at that article's edit history and find out, within a specified time range (e.g. the past 6 months), which revision remained unchallenged for the longest time until the next revision occurred. Hey Ziyuan, that's great! Have you made a statistical analysis whether the average revision that remained unchallanged by a long time is better than the average other revisions? Honestly I haven't done a lot of tests, but I did investigated how ultra-stable Linux distributions (Debian, RHEL/CentOS) select stable software packages. I found Debian's model very similar to my idea: latest software package versions are put in a pool called unstable; if a package version remains in the unstable pool for a certain period of time with no serious bugs discovered, it is automatically moved to the next pool, testing; again, if a package version remains in the testing pool for a certain period of time with no bugs discovered, it is automatically moved to the next pool, stable. Each new major release of Debian is a collection of all packages in the stable pool. ( http://en.wikipedia.org/wiki/Debian#Development_procedures) So it seems this trial by time approach at least works for a big open source software project like Debian. It would seem to me (as it seems to Tom), that often that's a false presumption, though that's probably based on guesses and anecdotal experience. Regards, Tobias ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] RevisionRank: automatically finding out high-quality revisions of an article
2011/12/20 David Gerard dger...@gmail.com: On 20 December 2011 01:16, Tom Morris t...@tommorris.org wrote: Under your metric, in this scenario, the edits of a sysop and an experienced user, or later the WikiProject editors, would not be chosen as the high-quality stable version. Yao did in fact mention that other factors would need consideration. And being able to pick a hole doesn't make the algorithm useless - Google certainly went past simple page rank very early on. The question is if Yao's algorithm has markedly better results than just picking the latest. This would warrant investigation, at the least. It is just a 2-3 hours work to select random 100-200 articles - check their history and evaluate if this idea really gona work... IMHO rather not at all. I just checked 10 random articles in English Wikipedia and found that the current versions are usually better than the most stable ones. It is quite common that the last stable version of article is covered by a set of bot-made edits. So at least the bot-made edits should not to be taken into consideration when choosing the most stable version. -- Tomek Polimerek Ganicz http://pl.wikimedia.org/wiki/User:Polimerek http://www.ganicz.pl/poli/ http://www.cbmm.lodz.pl/work.php?id=29title=tomasz-ganicz ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] RevisionRank: automatically finding out high-quality revisions of an article
On Tue, Dec 20, 2011 at 9:07 PM, WereSpielChequers werespielchequ...@gmail.com wrote: -- On Tue, Dec 20, 2011 at 6:38 AM, Yao Ziyuan yaoziy...@gmail.com wrote: Hi Wikipedians, I seem to have found a way to automatically judge which revision of a Wikipedia article has the best quality. It's very simple: look at that article's edit history and find out, within a specified time range (e.g. the past 6 months), which revision remained unchallenged for the longest time until the next revision occurred. Of course there can be additional factors to refine this, such as also considering each revision's author's reputation (Wikipedia has a reputation system for Wikipedians), but I still feel the above idea is the simplest and most elegant, just like the original PageRank idea is for Google. Best Regards, Ziyuan Yao -- Message: 11 Date: Tue, 20 Dec 2011 01:16:15 + From: Tom Morris t...@tommorris.org Subject: Re: [Foundation-l] RevisionRank: automatically finding out high-quality revisions of an article To: Wikimedia Foundation Mailing List foundation-l@lists.wikimedia.org Message-ID: CAAQB2S9TEQhuiaD4Gb0ZjV-tVSLRgmmjHjbHJwAww= 8uift...@mail.gmail.com Content-Type: text/plain; charset=ISO-8859-1 On Mon, Dec 19, 2011 at 22:38, Yao Ziyuan yaoziy...@gmail.com wrote: I seem to have found a way to automatically judge which revision of a Wikipedia article has the best quality. It's very simple: look at that article's edit history and find out, within a specified time range (e.g. the past 6 months), which revision remained unchallenged for the longest time until the next revision occurred. Of course there can be additional factors to refine this, such as also considering each revision's author's reputation (Wikipedia has a reputation system for Wikipedians), but I still feel the above idea is the simplest and most elegant, just like the original PageRank idea is for Google. Okay, how about this. I find a page today that has had only one edit in the past year. That edit was an IP editor changing the page to insert the image of a man sticking his genitalia into a bowl of warm pasta (I haven't checked Wikimedia Commons but would not be surprised...). Nobody notices the change until I come along and undo it. I then see that it is a topic that interests both myself and a friend of mine, and we collaborate on improving the article together: he writes the prose and I dig out obscure references from academic databases. Between us, we edit the page four or five times a day, every day for a week improving the article until it reaches GA status. Having nominated it for GA, a WikiProject picks up on the importance of the topic and a whole swarm of editors interested in the topic swoop in and keep editing it collaboratively for months on end. Under your metric, in this scenario, the edits of a sysop and an experienced user, or later the WikiProject editors, would not be chosen as the high-quality stable version. As for author reputation, check out the WikiTrust extension for Firefox - see http://www.wikitrust.net/ -- Tom Morris http://tommorris.org/ Hi Ziyuan Yao, that is an interesting idea, but not necessarily something that one should do automatically. I recently found an article that since 2006 had been telling the world where the Holy Grail had been from the closure of Glastonbury monastery until the start of the twentieth century. It will take some years of non-editing for the new version of that article to become the stable one. Also some of the articles that our readers are most interested in would look a tad dated. Sarah Palin's article may no longer be at the 25 edits per minute stage that it peaked at, but how many years will it be before it becomes as stable as it was a week before she became John McCain's running mate? First, read my last message mentioning Debian. It's possible that we give different types of articles different periods of time for trial. We can manually specify that articles under a certain category must wait 2 months to be considered mature while articles under another category must wait 1 year. Debian also allows software testers to set different waiting time for each package (called urgency); urgent packages can graduate faster than normal packages. Or we can automatically determine the waiting time for each article based on how hotly it is edited. Hotly edited articles like Sarah Palin can automatically have a shorter waiting time to become mature. It's all relative to an article's editing frequency and viewing frequency. Of course the edit history is out there so the earlier versions are available under the same license as the current version. So any enterprising mirror could adopt a system
Re: [Foundation-l] RevisionRank: automatically finding out high-quality revisions of an article
On Tue, Dec 20, 2011 at 10:22 PM, Tomasz Ganicz polime...@gmail.com wrote: 2011/12/20 David Gerard dger...@gmail.com: On 20 December 2011 01:16, Tom Morris t...@tommorris.org wrote: Under your metric, in this scenario, the edits of a sysop and an experienced user, or later the WikiProject editors, would not be chosen as the high-quality stable version. Yao did in fact mention that other factors would need consideration. And being able to pick a hole doesn't make the algorithm useless - Google certainly went past simple page rank very early on. The question is if Yao's algorithm has markedly better results than just picking the latest. This would warrant investigation, at the least. It is just a 2-3 hours work to select random 100-200 articles - check their history and evaluate if this idea really gona work... IMHO rather not at all. I just checked 10 random articles in English Wikipedia and found that the current versions are usually better than It all depends on the definition of better. When it comes to least vandalized, I think my idea can work; when it comes to most up-to-date and feature-rich, I think the latest revision will work. It's exactly who some people like CentOS for stability and error-free-ness while other people like Fedora for having the latest cool features. I said we can have both, by showing two tabs Latest and Most stable that let the reader choose which to view. the most stable ones. It is quite common that the last stable version of article is covered by a set of bot-made edits. So at least the bot-made edits should not to be taken into consideration when choosing the most stable version. -- Tomek Polimerek Ganicz http://pl.wikimedia.org/wiki/User:Polimerek http://www.ganicz.pl/poli/ http://www.cbmm.lodz.pl/work.php?id=29title=tomasz-ganicz ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] RevisionRank: automatically finding out high-quality revisions of an article
On Tue, Dec 20, 2011 at 14:55, Yao Ziyuan yaoziy...@gmail.com wrote: On Tue, Dec 20, 2011 at 6:51 PM, Tobias church.of.emacs...@googlemail.comwrote: On 12/19/2011 11:38 PM, Yao Ziyuan wrote: Hi Wikipedians, I seem to have found a way to automatically judge which revision of a Wikipedia article has the best quality. It's very simple: look at that article's edit history and find out, within a specified time range (e.g. the past 6 months), which revision remained unchallenged for the longest time until the next revision occurred. Hey Ziyuan, that's great! Have you made a statistical analysis whether the average revision that remained unchallanged by a long time is better than the average other revisions? Honestly I haven't done a lot of tests, but I did investigated how ultra-stable Linux distributions (Debian, RHEL/CentOS) select stable software packages. I found Debian's model very similar to my idea: latest software package versions are put in a pool called unstable; if a package version remains in the unstable pool for a certain period of time with no serious bugs discovered, it is automatically moved to the next pool, testing; again, if a package version remains in the testing pool for a certain period of time with no bugs discovered, it is automatically moved to the next pool, stable. Each new major release of Debian is a collection of all packages in the stable pool. ( http://en.wikipedia.org/wiki/Debian#Development_procedures) So it seems this trial by time approach at least works for a big open source software project like Debian. It works because in order to put packages in the testing/unstable pool you need to have 'Debian Developer' credentials. Getting those credentials is harder then getting an account on Wikipedia and hence the input into testingstable can be considered a best effort and doens't have to account for potential vandalism etc. ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] RevisionRank: automatically finding out high-quality revisions of an article
On Tue, Dec 20, 2011 at 14:55, Yao Ziyuan yaoziy...@gmail.com wrote: On Tue, Dec 20, 2011 at 6:51 PM, Tobias church.of.emacs...@googlemail.comwrote: On 12/19/2011 11:38 PM, Yao Ziyuan wrote: Hi Wikipedians, I seem to have found a way to automatically judge which revision of a Wikipedia article has the best quality. It's very simple: look at that article's edit history and find out, within a specified time range (e.g. the past 6 months), which revision remained unchallenged for the longest time until the next revision occurred. Hey Ziyuan, that's great! Have you made a statistical analysis whether the average revision that remained unchallanged by a long time is better than the average other revisions? Honestly I haven't done a lot of tests, but I did investigated how ultra-stable Linux distributions (Debian, RHEL/CentOS) select stable software packages. I found Debian's model very similar to my idea: latest software package versions are put in a pool called unstable; if a package version remains in the unstable pool for a certain period of time with no serious bugs discovered, it is automatically moved to the next pool, testing; again, if a package version remains in the testing pool for a certain period of time with no bugs discovered, it is automatically moved to the next pool, stable. Each new major release of Debian is a collection of all packages in the stable pool. ( http://en.wikipedia.org/wiki/Debian#Development_procedures) So it seems this trial by time approach at least works for a big open source software project like Debian. It works because in order to put packages in the testing/unstable pool you need to have 'Debian Developer' credentials. Getting those credentials is harder then getting an account on Wikipedia and hence the input into testingstable can be considered a best effort and doens't have to account for potential vandalism etc. ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
[Foundation-l] IRC office hours with the WMF features team, Jan. 4th 2012
Hey everyone, Since folks have been asking about it, I wanted to announce that the features development team at the Wikimedia Foundation will be holding an office hours (in #wikimedia-office) about the general past, present, and future of MediaWiki features being worked on here at the WMF. This will be on January 4th, 2012 at 23:00 UTC. Documentation is on Meta for time conversion and IRC how-tos.[1] -- Steven Walling Community Organizer at Wikimedia Foundation wikimediafoundation.org 1. https://meta.wikimedia.org/wiki/IRC_office_hours ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] [Wikimedia Announcements] Announcing new Wikimedia Foundation CTCO: Gayle Karen Young
Welcome, wish you best, Mardetanha On Tue, Dec 20, 2011 at 10:09 PM, Sue Gardner sgard...@wikimedia.orgwrote: ***Resending this note because the earlier version seemed to have really broken formatting. Hope this is better.*** Hello folks, I’m delighted to tell you that the Wikimedia Foundation has a new Chief Talent and Culture Officer, Gayle Karen Young. Recapping: the purpose of the CTCO role is to have a person on staff dedicated to continually strengthening and improving all our practices related to people --such as recruitment, on-boarding, skills development, organizational design, goal-setting, compensation and performance assessment-- with the overall goal of ensuring that the Wikimedia Foundation’s work culture is healthy and high-performance. I created the role because I believe that for organizations to be effective, it's critical that they have good talent and culture practices. Most non-profits skimp on funding HR because they want to be cautious with donors’ money, and they think investing in people is a bit of a luxury. I disagree. At the Wikimedia Foundation, half our spending is on salaries -- in other words, on people. So it seems to me that recruiting great people and creating the conditions in which they can flourish, is an excellent investment. That’s why the Wikimedia Foundation has a CTCO. Back to Gayle. A few months ago, Cyn Skyberg told Wikimedia she’d be leaving us. I then hired Lisa Grossman of m|Oppenheim to find us a successor for Cyn. Lisa spoke with hundreds of candidates, and brought six to be interviewed by me, Erik and Garfield Byrd. Our finalist candidates then spoke with Cyn, Barry, Geoff and Zack, and worked on projects for us which involved interviewing Aaron Schulz, Alolita Sharma, Asher Feldman, Brandon Harris, CT Woo, Dana Isokawa, Howie Fung, Jay Walsh, Kul Wadhwa, Leslie Harms, Melanie Brown, Rob Lanphier, Steven Walling and Tomasz Finc. They were also interviewed by Jan-Bart de Vreede, the vice-chair of the Board and the chair of the Board’s HR committee. It was an extensive search! And I am really happy about the outcome. Gayle Karen Young is a seasoned HR consultant and organizational psychologist with expertise in leadership development, change management, facilitation, group dynamics, and Agile team effectiveness training. She has worked with a wide variety of non-profit and for-profit organizations across industries including tech, hospitality, restaurants, airlines, healthcare, and education. She is the board president of Spark, a non-profit organization that engages young people in global women’s human rights issues. Gayle is also a facilitator for the Stanford Graduate School of Business for their Interpersonal Dynamics course and their Women in Management program. She mentors for the Thiel Foundation’s 20 Under 20 Fellowship program, and generally supports futurist causes because she likes audacious ideas and grand challenges. She has designed and facilitated conferences for the Singularity Summit, BIL (TED’s un-conference sibling), and the Seasteading Institute. She has a BA in psychology from the University of San Francisco, and an MA in organizational psychology from Alliant International University. I think Gayle will be a really great culture fit for the Wikimedia movement. She's an iconoclastic geek who goes to ComicCon, but unlike most geeks she is warm and people-centred: when she was a kid, she wanted to grow up to be Deanna Troi from Star Trek. She’s insatiably curious and reads widely. She was born in the Philippines and travels annually with Spark, most recently to China and Cambodia. You can read more about Gayle here on her userpage on the English Wikipedia: http://en.wikipedia.org/wiki/User:GayleKaren, and you can see some of the work she’s done for us here: http://en.wikipedia.org/wiki/User:GayleKaren/WMF_Recruiting_Strategy_Project . I want to thank everyone who was involved in this long and elaborate hiring process, and I want to especially thank Cyn. As the Wikimedia Foundation's first CTCO Cyn had the unenviable task of breaking lots of new ground – she leaves us in much better shape than she found us, and I’m grateful to her for everything she's done for us. Gayle will start work January 3. She’s a foundation-l subscriber, so I believe she will see any replies to this e-mail. I'm on holiday for the next three days, so if there are any replies to this note that need a response from me, you'll hear from me Friday. Thanks, Sue -- Sue Gardner Executive Director Wikimedia Foundation 415 839 6885 office 415 816 9967 cell Imagine a world in which every single human being can freely share in the sum of all knowledge. Help us make it a reality! http://wikimediafoundation.org/wiki/Donate ___ Please note: all replies sent to this mailing list will be immediately directed to Foundation-L, the