Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.
The "PoweredBy" page has been changed by SomeOtherAccount. http://wiki.apache.org/hadoop/PoweredBy?action=diff&rev1=222&rev2=223 -------------------------------------------------- - Applications and organizations using Hadoop include (alphabetically): + This page documents an alphabetical list of institutions that are using Hadoop for educational or production uses. Companies that offer services on or based around Hadoop are listed in Distributions. <<TableOfContents(3)>> @@ -38, +38 @@ * A 15-node cluster dedicated to processing sorts of business data dumped out of database and joining them together. These data will then be fed into iSearch, our vertical search engine. * Each node has 8 cores, 16G RAM and 1.4T storage. - * [[http://aws.amazon.com/|Amazon Web Services]] - * We provide [[http://aws.amazon.com/elasticmapreduce|Amazon Elastic MapReduce]]. It's a web service that provides a hosted Hadoop framework running on the web-scale infrastructure of Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Simple Storage Service (Amazon S3). - - * Our customers can instantly provision as much or as little capacity as they like to perform data-intensive tasks for applications such as web indexing, data mining, log file analysis, machine learning, financial analysis, scientific simulation, and bioinformatics research. * [[http://aol.com/|AOL]] * We use hadoop for variety of things ranging from ETL style processing and statistics generation to running advanced algorithms for doing behavioral analysis and targeting. @@ -71, +67 @@ * [[http://www.benipaltechnologies.com|Benipal Technologies]] - Outsourcing, Consulting, Innovation * 35 Node Cluster (Core2Quad Q9400 Processor, 4-8 GB RAM, 500 GB HDD) - * Largest Data Node with Xeon E5420*2 Processors, 64GB RAM, 3.5 TB HDD * Total Cluster capacity of around 20 TB on a gigabit network with failover and redundancy * Hadoop is used for internal data crunching, application development, testing and getting around I/O limitations @@ -79, +74 @@ * [[http://bixolabs.com/|Bixo Labs]] - Elastic web mining * The Bixolabs elastic web mining platform uses Hadoop + Cascading to quickly build scalable web mining applications. * We're doing a 200M page/5TB crawl as part of the [[http://bixolabs.com/datasets/public-terabyte-dataset-project/|public terabyte dataset project]]. - * This runs as a 20 machine [[http://aws.amazon.com/elasticmapreduce/|Elastic MapReduce]] cluster. * [[http://www.brainpad.co.jp|BrainPad]] - Data mining and analysis @@ -87, +81 @@ * And use analyzing. = C = - - * [[http://www.cascading.org/|Cascading]] - Cascading is a feature rich API for defining and executing complex and fault tolerant data processing workflows on a Hadoop cluster. - - * [[http://www.cloudera.com|Cloudera, Inc]] - Cloudera provides commercial support and professional training for Hadoop. - * We provide [[http://www.cloudera.com/hadoop|Cloudera's Distribution for Hadoop]]. Stable packages for Redhat and Ubuntu (rpms / debs), EC2 Images and web based configuration. - * Check out our [[http://www.cloudera.com/blog|Hadoop and Big Data Blog]] - * Get [[http://oreilly.com/catalog/9780596521998/index.html|"Hadoop: The Definitive Guide"]] (Tom White/O'Reilly) * [[http://www.contextweb.com/|Contextweb]] - Ad Exchange * We use Hadoop to store ad serving logs and use it as a source for ad optimizations, analytics, reporting and machine learning. @@ -115, +102 @@ * We've developed [[http://rdfgrid.rubyforge.org/|RDFgrid]], a Ruby framework for map/reduce-based processing of RDF data. * We primarily use Ruby, [[http://rdf.rubyforge.org/|RDF.rb]] and RDFgrid to process RDF data with Hadoop Streaming. * We primarily run Hadoop jobs on Amazon Elastic MapReduce, with cluster sizes of 1 to 20 nodes depending on the size of the dataset (hundreds of millions to billions of RDF statements). - - * [[http://www.datameer.com|Datameer]] - * Datameer Analytics Solution (DAS) is the first Hadoop-based solution for big data analytics that includes data source integration, storage, an analytics engine and visualization. - * DAS Log File Aggregator is a plug-in to DAS that makes it easy to import large numbers of log files stored on disparate servers. * [[http://www.deepdyve.com|Deepdyve]] * Elastic cluster with 5-80 nodes @@ -234, +217 @@ * [[http://www.ibm.com|IBM]] * [[http://www-03.ibm.com/press/us/en/pressrelease/22613.wss|Blue Cloud Computing Clusters]] - * [[http://www-03.ibm.com/press/us/en/pressrelease/22414.wss|University Initiative to Address Internet-Scale Computing Challenges]] * [[http://www.iccs.informatics.ed.ac.uk/|ICCS]] @@ -268, +250 @@ * Using Hadoop MapReduce to analyse billions of lines of GPS data to create TrafficSpeeds, our accurate traffic speed forecast product. = K = - - * [[http://www.karmasphere.com/|Karmasphere]] - * Distributes [[http://www.hadoopstudio.org/|Karmasphere Studio for Hadoop]], which allows cross-version development and management of Hadoop jobs in a familiar integrated development environment. * [[http://katta.wiki.sourceforge.net/|Katta]] - Katta serves large Lucene indexes in a grid environment. * Uses Hadoop FileSytem, RPC and IO @@ -342, +321 @@ * 18 node cluster (Quad-Core AMD Opteron 2347, 1TB/node storage) * Powers data for search and aggregation - * [[http://lucene.apache.org/mahout|Mahout]] - . Another Apache project using Hadoop to build scalable machine learning algorithms like canopy clustering, k-means and many more to come (naive bayes classifiers, others) - * [[http://metrixcloud.com/|MetrixCloud]] - provides commercial support, installation, and hosting of Hadoop Clusters. [[http://metrixcloud.com/contact.php|Contact Us.]] = N = @@ -368, +344 @@ * We rely on Apache Pig for reporting, analytics, Cascading for machine learning, and on a proprietary JavaScript API for ad-hoc queries * We use commodity hardware, with 8 cores and 16 GB of RAM per machine - * [[http://lucene.apache.org/nutch|Nutch]] - flexible web search engine software - = O = = P = * [[http://parc.com|PARC]] - Used Hadoop to analyze Wikipedia conflicts [[http://asc.parc.googlepages.com/2007-10-28-VAST2007-RevertGraph-Wiki.pdf|paper]]. - * [[http://pentaho.com|Pentaho]] – Open Source Business Intelligence - * Pentaho provides the only complete, end-to-end open source BI alternative to proprietary offerings like Oracle, SAP and IBM - * We provide an easy-to-use, graphical ETL tool that is integrated with Hadoop for managing data and coordinating Hadoop related tasks in the broader context of your ETL and Business Intelligence workflow - * We also provide Reporting and Analysis capabilities against big data in Hadoop - * Learn more at [[http://www.pentaho.com/hadoop/|http://www.pentaho.com/hadoop]] * [[http://pharm2phork.org|Pharm2Phork Project]] - Agricultural Traceability * Using Hadoop on EC2 to process observation messages generated by RFID/Barcode readers as items move through supply chain.
