Hoi,
On that note ...
http://hardware.slashdot.org/article.pl?sid=09/01/02/1546214
Thanks,
GerardM
2009/1/1 geni geni...@gmail.com
2008/12/25 Gerard Meijssen gerard.meijs...@gmail.com:
Hoi,
It is not one either. It has been said repeatedly that the process of a
straightforward back
A week ago I published new wikistats files, for the first time in 7 months,
only to retract them 2 days later, when it turned out that counts for some
wikis were completely wrong. After some serious bug hunting I nailed the
creepy creature that had been hiding in an unexpected corner (most bugs
There is something seriously wrong with the figures for some wikipedias in
the new wikistats reports. The figures for some wikis are much too low. When
comparing csv files (raw counts) produced in May 2008 and produced recently
it is quite easy to tell the difference. For some wikis the data for
On Wed, Dec 24, 2008 at 7:09 PM, Brian brian.min...@colorado.edu wrote:
Interesting. I realize that the dump is extremely large, but if 7zip is
really the bottleneck then to me the solutions are straightforward:
1. Offer an uncompressed version of the dump for download. Bandwidth is
cheap and
New wikistats reports have been published today, for the first time since
May 2008. The reports have been generated on the new wikistats server
Bayes, which is operational since a few weeks. The dump process itself had
been restarted some weeks earlier, new dumps are now available for all 700+
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Thank you Erik!
Erik Zachte wrote:
New wikistats reports have been published today, for the first time since
May 2008. The reports have been generated on the new wikistats server
‘Bayes’, which is operational since a few weeks. The dump process
John:
For the Page Views data on some projects, the May data
looks unusually lower than the June data;
could it be that the May data isn't
a complete month for some projects?
Yes, that is indeed the case. I will omit the incomplete month on subsequent
reports.
Erik Zachte
Hi Brian, Brion once explained to me that the post processing of the dump is
the main bottleneck.
Compressing articles with tens of thousands of revisions is a major resource
drain.
Right now every dump is even compressed twice, into bzip2 (for wider
platform compatibility) and 7zip format (for
Also, I wonder if these folks have been consulted for their expertise in
compressing wikipedia data: http://prize.hutter1.net/
On Wed, Dec 24, 2008 at 5:09 PM, Brian brian.min...@colorado.edu wrote:
Interesting. I realize that the dump is extremely large, but if 7zip is
really the bottleneck
2008/12/25 Erik Zachte erikzac...@infodisiac.com:
Hi Brian, Brion once explained to me that the post processing of the dump is
the main bottleneck.
Compressing articles with tens of thousands of revisions is a major resource
drain.
Right now every dump is even compressed twice, into bzip2
On Wed, Dec 24, 2008 at 4:09 PM, Brian brian.min...@colorado.edu wrote:
Interesting. I realize that the dump is extremely large, but if 7zip is
really the bottleneck then to me the solutions are straightforward:
1. Offer an uncompressed version of the dump for download. Bandwidth is
cheap and
Hi Robert,
I'm not sure I agree with you..
(3 terabytes / 10 megabytes) seconds in days = 3.64 days
That is, on my university connection I could download the dump in just a few
days. The only cost is bandwidth.
On Wed, Dec 24, 2008 at 6:46 PM, Robert Rohde raro...@gmail.com wrote:
On Wed,
On Wed, Dec 24, 2008 at 6:05 PM, Brian brian.min...@colorado.edu wrote:
Hi Robert,
I'm not sure I agree with you..
(3 terabytes / 10 megabytes) seconds in days = 3.64 days
That is, on my university connection I could download the dump in just a few
days. The only cost is bandwidth.
While
2008/12/25 Brian brian.min...@colorado.edu:
But at least this would allow Erik, researchers and archivers to get the
dump faster than they can get the compressed version. The number of people
who want this can't be 100, can it? It would need to be metered by an API
I guess.
Maybe we can
On Wed, Dec 24, 2008 at 6:29 PM, Brian brian.min...@colorado.edu wrote:
I'm also curious, what is the estimated amount of time to decompress this
thing?
Somewhere around 1 week, I'd guesstimate.
-Robert Rohde
___
foundation-l mailing list
2008/12/25 David Gerard dger...@gmail.com:
2008/12/25 Brian brian.min...@colorado.edu:
But at least this would allow Erik, researchers and archivers to get the
dump faster than they can get the compressed version. The number of people
who want this can't be 100, can it? It would need to be
Hoi,
It is not one either. It has been said repeatedly that the process of a
straightforward back up is something that is done on a regular basis. This
however includes a lot of information that we do not allow to be included in
the data export that is made available to the public. So never mind
17 matches
Mail list logo