2009/2/25 Nathan <[email protected]>: > http://www.nytimes.com/external/readwriteweb/2009/02/25/25readwriteweb-amazon_exposes_1_terrabyte_of.html > > According to this, a new project by Amazon that makes a terabyte of public > data available includes a full dump of Wikipedia. It also includes the > complete dbpedia - so it seems like there are likely to be lots of > duplicates. Given the other information it says it includes (the whole human > genome, all other publicly available DNA sequences, census data, etc.) I'm > not sure how it all fits in a single terabyte. Interesting concept, though. > I wonder how old the dump is, since they've been unavailable for some time?
It probably only contains the latest copies of each page in the main namespace, rather than a full dump (I can't see why they would want a full dump). That's pretty small (a bit larger if they've included images, of course). I think there have been article dumps of enwiki reasonably recently, it's just the full dumps that always fail. _______________________________________________ foundation-l mailing list [email protected] Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
