Tim, wonderful news! Thank you for making them publicly available!
Of course I immediately downloaded them, and I must have a look at them later this week. Though they are from before I became active (2003) I am very curious if the articles in these files still exist, and how much they changed. teun spaans On Tue, Dec 14, 2010 at 4:54 PM, Tim Starling <[email protected]>wrote: > I was looking through some old files in our SourceForge project. I > opened a file called wiki.tar.gz, and inside were three complete > backups of the text of Wikipedia, from February, March and August 2001! > > This is exciting, because there is lots of article history in here > which was assumed to be lost forever. > > I've long been interested in Wikipedia's history, and I've tried in > the past to locate such backups. I asked various people who might have > had one. I had given up hope. > > The history of particularly old Wikipedia articles, as seen in the > present Wikipedia database, is incomplete, due to Usemod's policy of > deleting old revisions of pages after about a month. The script which > Brion wrote to import the article histories from UseMod to MediaWiki > only fetched those revisions which hadn't been purged yet. > > I didn't want to believe that those revisions had been lost forever, > and I even opened the UseMod source code and stared forlornly at the > unlink() call. What I (and Brion before) missed is that UseMod appends > a record of every change made to two files, called diff_log and rclog. > In these two files is a record of every change made to Wikipedia from > January 15 to August 17, 2001. > > I've put the two log files up on the web, at: > > http://noc.wikimedia.org/~tstarling/wikipedia-logs-2001-08-17.7z<http://noc.wikimedia.org/%7Etstarling/wikipedia-logs-2001-08-17.7z> > > The 7-zip archive is only 8.4MB -- much more manageable than today's > backups. > > rclog contains IP addresses. The Usemod software made IP addresses of > logged-in users public, so the people who made these edits had no > expectation that their IP address would be kept private. That, coupled > with the passage of time, makes me think that no harm to user privacy > can come from releasing these files. > > -- Tim Starling > > _______________________________________________ > foundation-l mailing list > [email protected] > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l > _______________________________________________ foundation-l mailing list [email protected] Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
