Re: [Foundation-l] Wikistats is back

2009-01-02 Thread Gerard Meijssen
Hoi, On that note ... http://hardware.slashdot.org/article.pl?sid=09/01/02/1546214 Thanks, GerardM 2009/1/1 geni geni...@gmail.com 2008/12/25 Gerard Meijssen gerard.meijs...@gmail.com: Hoi, It is not one either. It has been said repeatedly that the process of a straightforward back

[Foundation-l] Wikistats is back

2009-01-02 Thread Erik Zachte
A week ago I published new wikistats files, for the first time in 7 months, only to retract them 2 days later, when it turned out that counts for some wikis were completely wrong. After some serious bug hunting I nailed the creepy creature that had been hiding in an unexpected corner (most bugs

[Foundation-l] Wikistats is back to May 2008 version

2008-12-25 Thread Erik Zachte
There is something seriously wrong with the figures for some wikipedias in the new wikistats reports. The figures for some wikis are much too low. When comparing csv files (raw counts) produced in May 2008 and produced recently it is quite easy to tell the difference. For some wikis the data for

Re: [Foundation-l] Wikistats is back

2008-12-25 Thread Aryeh Gregor
On Wed, Dec 24, 2008 at 7:09 PM, Brian brian.min...@colorado.edu wrote: Interesting. I realize that the dump is extremely large, but if 7zip is really the bottleneck then to me the solutions are straightforward: 1. Offer an uncompressed version of the dump for download. Bandwidth is cheap and

[Foundation-l] Wikistats is back

2008-12-24 Thread Erik Zachte
New wikistats reports have been published today, for the first time since May 2008. The reports have been  generated on the new wikistats server ‘Bayes’, which is operational since a few weeks. The dump process itself had been restarted some weeks earlier, new dumps are now available for all 700+

Re: [Foundation-l] Wikistats is back

2008-12-24 Thread Jon
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Thank you Erik! Erik Zachte wrote: New wikistats reports have been published today, for the first time since May 2008. The reports have been generated on the new wikistats server ‘Bayes’, which is operational since a few weeks. The dump process

[Foundation-l] Wikistats is back

2008-12-24 Thread Erik Zachte
John: For the Page Views data on some projects, the May data looks unusually lower than the June data; could it be that the May data isn't a complete month for some projects? Yes, that is indeed the case. I will omit the incomplete month on subsequent reports. Erik Zachte

[Foundation-l] Wikistats is back

2008-12-24 Thread Erik Zachte
Hi Brian, Brion once explained to me that the post processing of the dump is the main bottleneck. Compressing articles with tens of thousands of revisions is a major resource drain. Right now every dump is even compressed twice, into bzip2 (for wider platform compatibility) and 7zip format (for

Re: [Foundation-l] Wikistats is back

2008-12-24 Thread Brian
Also, I wonder if these folks have been consulted for their expertise in compressing wikipedia data: http://prize.hutter1.net/ On Wed, Dec 24, 2008 at 5:09 PM, Brian brian.min...@colorado.edu wrote: Interesting. I realize that the dump is extremely large, but if 7zip is really the bottleneck

Re: [Foundation-l] Wikistats is back

2008-12-24 Thread David Gerard
2008/12/25 Erik Zachte erikzac...@infodisiac.com: Hi Brian, Brion once explained to me that the post processing of the dump is the main bottleneck. Compressing articles with tens of thousands of revisions is a major resource drain. Right now every dump is even compressed twice, into bzip2

Re: [Foundation-l] Wikistats is back

2008-12-24 Thread Robert Rohde
On Wed, Dec 24, 2008 at 4:09 PM, Brian brian.min...@colorado.edu wrote: Interesting. I realize that the dump is extremely large, but if 7zip is really the bottleneck then to me the solutions are straightforward: 1. Offer an uncompressed version of the dump for download. Bandwidth is cheap and

Re: [Foundation-l] Wikistats is back

2008-12-24 Thread Brian
Hi Robert, I'm not sure I agree with you.. (3 terabytes / 10 megabytes) seconds in days = 3.64 days That is, on my university connection I could download the dump in just a few days. The only cost is bandwidth. On Wed, Dec 24, 2008 at 6:46 PM, Robert Rohde raro...@gmail.com wrote: On Wed,

Re: [Foundation-l] Wikistats is back

2008-12-24 Thread Robert Rohde
On Wed, Dec 24, 2008 at 6:05 PM, Brian brian.min...@colorado.edu wrote: Hi Robert, I'm not sure I agree with you.. (3 terabytes / 10 megabytes) seconds in days = 3.64 days That is, on my university connection I could download the dump in just a few days. The only cost is bandwidth. While

Re: [Foundation-l] Wikistats is back

2008-12-24 Thread David Gerard
2008/12/25 Brian brian.min...@colorado.edu: But at least this would allow Erik, researchers and archivers to get the dump faster than they can get the compressed version. The number of people who want this can't be 100, can it? It would need to be metered by an API I guess. Maybe we can

Re: [Foundation-l] Wikistats is back

2008-12-24 Thread Robert Rohde
On Wed, Dec 24, 2008 at 6:29 PM, Brian brian.min...@colorado.edu wrote: I'm also curious, what is the estimated amount of time to decompress this thing? Somewhere around 1 week, I'd guesstimate. -Robert Rohde ___ foundation-l mailing list

Re: [Foundation-l] Wikistats is back

2008-12-24 Thread geni
2008/12/25 David Gerard dger...@gmail.com: 2008/12/25 Brian brian.min...@colorado.edu: But at least this would allow Erik, researchers and archivers to get the dump faster than they can get the compressed version. The number of people who want this can't be 100, can it? It would need to be

Re: [Foundation-l] Wikistats is back

2008-12-24 Thread Gerard Meijssen
Hoi, It is not one either. It has been said repeatedly that the process of a straightforward back up is something that is done on a regular basis. This however includes a lot of information that we do not allow to be included in the data export that is made available to the public. So never mind