[Hadoop Wiki] Update of "PoweredBy" by ArtoBendiken

Apache Wiki Sat, 08 May 2010 14:30:52 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.


The "PoweredBy" page has been changed by ArtoBendiken.
The comment on this change is: Added a section on Datagraph's use of Hadoop..
http://wiki.apache.org/hadoop/PoweredBy?action=diff&rev1=196&rev2=197

--------------------------------------------------

  
   * [[http://www.weblab.infosci.cornell.edu/|Cornell University Web Lab]]
    * Generating web graphs on 100 nodes (dual 2.4GHz Xeon Processor, 2 GB RAM, 
72GB Hard Drive)
+ 
+  * [[http://datagraph.org/|Datagraph]]
+   * We use Hadoop for batch-processing large [[http://www.w3.org/RDF/|RDF]] 
datasets, in particular for indexing RDF data.
+   * We also use Hadoop for executing long-running offline 
[[http://en.wikipedia.org/wiki/SPARQL|SPARQL]] queries for clients.
+   * We use Amazon S3 and Cassandra to store input RDF datasets and output 
files.
+   * We've developed [[http://rdfgrid.rubyforge.org/|RDFgrid]], a Ruby 
framework for map/reduce-based processing of RDF data.
+   * We primarily use Ruby, [[http://rdf.rubyforge.org/|RDF.rb]] and RDFgrid 
to process RDF data with Hadoop Streaming.
+   * We primarily run Hadoop jobs on Amazon Elastic MapReduce, with cluster 
sizes of 1 to 20 nodes depending on the size of the dataset (hundreds of 
millions to billions of RDF statements).
  
   * [[http://www.deepdyve.com|Deepdyve]]
    * Elastic cluster with 5-80 nodes

[Hadoop Wiki] Update of "PoweredBy" by ArtoBendiken

Reply via email to