Re: [Foundation-l] Amazon Public Data includes Wikipedia

Thomas Dalton Wed, 25 Feb 2009 08:09:00 -0800

2009/2/25 Nathan <[email protected]>:
> http://www.nytimes.com/external/readwriteweb/2009/02/25/25readwriteweb-amazon_exposes_1_terrabyte_of.html
>
> According to this, a new project by Amazon that makes a terabyte of public
> data available includes a full dump of Wikipedia. It also includes the
> complete dbpedia - so it seems like there are likely to be lots of
> duplicates. Given the other information it says it includes (the whole human
> genome, all other publicly available DNA sequences, census data, etc.) I'm
> not sure how it all fits in a single terabyte.  Interesting concept, though.
> I wonder how old the dump is, since they've been unavailable for some time?


It probably only contains the latest copies of each page in the main
namespace, rather than a full dump (I can't see why they would want a
full dump). That's pretty small (a bit larger if they've included
images, of course). I think there have been article dumps of enwiki
reasonably recently, it's just the full dumps that always fail.

_______________________________________________
foundation-l mailing list
[email protected]
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

Re: [Foundation-l] Amazon Public Data includes Wikipedia

Reply via email to