Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.
The "PoweredBy" page has been changed by ArtoBendiken. The comment on this change is: Added a section on Datagraph's use of Hadoop.. http://wiki.apache.org/hadoop/PoweredBy?action=diff&rev1=196&rev2=197 -------------------------------------------------- * [[http://www.weblab.infosci.cornell.edu/|Cornell University Web Lab]] * Generating web graphs on 100 nodes (dual 2.4GHz Xeon Processor, 2 GB RAM, 72GB Hard Drive) + + * [[http://datagraph.org/|Datagraph]] + * We use Hadoop for batch-processing large [[http://www.w3.org/RDF/|RDF]] datasets, in particular for indexing RDF data. + * We also use Hadoop for executing long-running offline [[http://en.wikipedia.org/wiki/SPARQL|SPARQL]] queries for clients. + * We use Amazon S3 and Cassandra to store input RDF datasets and output files. + * We've developed [[http://rdfgrid.rubyforge.org/|RDFgrid]], a Ruby framework for map/reduce-based processing of RDF data. + * We primarily use Ruby, [[http://rdf.rubyforge.org/|RDF.rb]] and RDFgrid to process RDF data with Hadoop Streaming. + * We primarily run Hadoop jobs on Amazon Elastic MapReduce, with cluster sizes of 1 to 20 nodes depending on the size of the dataset (hundreds of millions to billions of RDF statements). * [[http://www.deepdyve.com|Deepdyve]] * Elastic cluster with 5-80 nodes
