I’d like to see the Commons backups available in the AMZN S3 cloud,  even if it 
is only as “requester pays”.  Frankly,  my experience is that getting data from 
the Internet Archive is so slow that I wonder if they are on the Moon.

My infovore framework

http://github.com/paulhoule/infovore

is specifically designed to make Hadoop applications easy to run in your own 
cluster on in a cluster provisioned automatically in Amazon EMR.  In 
particular,  an application can be packaged in the S3 cloud and run by somebody 
with little Hadoop or AWS experience.  This makes handling “big data” much more 
accessible than it ever has been.

AMZN has had a policy of offering free S3 storage for public data sets – I’d 
like to see them take this program to the next level with data sets of this 
nature.

From: Gerard Meijssen 
Sent: Monday, October 14, 2013 4:38 PM
To: Wikimedia Commons Discussion List 
Subject: Re: [Commons-l] [wikiteam-discuss:699] "Tarballs" of all 2004-2012 
Commons files now available at archive.org

Hoi, 

Geni, sorry but there is a difference of their being a backup within the WMF of 
Commons and there being a dataset of Commons at the IA that is not current. 
People can do all the analysis they want on the old data and it will not make 
any difference. It will not make the data that is currently in Commons any more 
accessible. 

We have been told repeatedly that the data at the WMF is secure. Beyond that 
the data is like knowing what the maximum is the insurance policy will pay. You 
know it will be not enough. It is however very much a hypothetical question. 
How to make Commons usable is an here and now issue.
Thanks,
     GerardM





On 14 October 2013 22:22, geni <[email protected]> wrote:





  On 14 October 2013 13:59, Gerard Meijssen <[email protected]> wrote:

    Hoi,

    While I do agree that it is good to have the data in many places and, the 
Internet Archive on its own moves it to several places as well. Many of us have 
seen the IA servers at the Library of Alexandria.


    While it is ok to find a use for the data at the IA, I would like us to 
concentrate first and foremost on how we can make better use of the media that 
is in Commons itself. How we can open it up to more use. Make Commons more 
accessable. 




  And you need to stop right there. As in don't express a further opinion until 
you realise how wrong you are. You can't do any analysis on data that is lost. 
And non backed up data is just data that doesn't know that it is lost yet.


  -- 
  geni 

  _______________________________________________
  Commons-l mailing list
  [email protected]
  https://lists.wikimedia.org/mailman/listinfo/commons-l





--------------------------------------------------------------------------------
_______________________________________________
Commons-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/commons-l
_______________________________________________
Commons-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/commons-l

Reply via email to