[Hadoop Wiki] Update of "PoweredBy" by SomeOtherAccount

Apache Wiki Tue, 26 Oct 2010 08:57:28 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.


The "PoweredBy" page has been changed by SomeOtherAccount.
http://wiki.apache.org/hadoop/PoweredBy?action=diff&rev1=222&rev2=223

--------------------------------------------------

- Applications and organizations using Hadoop include (alphabetically):
+ This page documents an alphabetical list of institutions that are using 
Hadoop for educational or production uses.  Companies that offer services on or 
based around Hadoop are listed in Distributions.
  
  <<TableOfContents(3)>>
  
@@ -38, +38 @@

    * A 15-node cluster dedicated to processing sorts of business data dumped 
out of database and joining them together. These data will then be fed into 
iSearch, our vertical search engine.
    * Each node has 8 cores, 16G RAM and 1.4T storage.
  
-  * [[http://aws.amazon.com/|Amazon Web Services]]
-   * We provide [[http://aws.amazon.com/elasticmapreduce|Amazon Elastic 
MapReduce]]. It's a web service that provides a hosted Hadoop framework running 
on the web-scale infrastructure of Amazon Elastic Compute Cloud (Amazon EC2) 
and Amazon Simple Storage Service (Amazon S3).
- 
-   * Our customers can instantly provision as much or as little capacity as 
they like to perform data-intensive tasks for applications such as web 
indexing, data mining, log file analysis, machine learning, financial analysis, 
scientific simulation, and bioinformatics research.
  
   * [[http://aol.com/|AOL]]
    * We use hadoop for variety of things ranging from ETL style processing and 
statistics generation to running advanced algorithms for doing behavioral 
analysis and targeting. 
@@ -71, +67 @@

  
   * [[http://www.benipaltechnologies.com|Benipal Technologies]] - Outsourcing, 
Consulting, Innovation
    * 35 Node Cluster (Core2Quad Q9400 Processor, 4-8 GB RAM, 500 GB HDD)
- 
    * Largest Data Node with Xeon E5420*2 Processors, 64GB RAM, 3.5 TB HDD
    * Total Cluster capacity of around 20 TB on a gigabit network with failover 
and redundancy
    * Hadoop is used for internal data crunching, application development, 
testing and getting around I/O limitations
@@ -79, +74 @@

   * [[http://bixolabs.com/|Bixo Labs]] - Elastic web mining
    * The Bixolabs elastic web mining platform uses Hadoop + Cascading to 
quickly build scalable web mining applications.
    * We're doing a 200M page/5TB crawl as part of the 
[[http://bixolabs.com/datasets/public-terabyte-dataset-project/|public terabyte 
dataset project]].
- 
    * This runs as a 20 machine 
[[http://aws.amazon.com/elasticmapreduce/|Elastic MapReduce]] cluster.
  
   * [[http://www.brainpad.co.jp|BrainPad]] - Data mining and analysis
@@ -87, +81 @@

    * And use analyzing.
  
  = C =
- 
-  * [[http://www.cascading.org/|Cascading]] - Cascading is a feature rich API 
for defining and executing complex and fault tolerant data processing workflows 
on a Hadoop cluster.
- 
-  * [[http://www.cloudera.com|Cloudera, Inc]] - Cloudera provides commercial 
support and professional training for Hadoop.
-   * We provide [[http://www.cloudera.com/hadoop|Cloudera's Distribution for 
Hadoop]]. Stable packages for Redhat and Ubuntu (rpms / debs), EC2 Images and 
web based configuration.
-   * Check out our [[http://www.cloudera.com/blog|Hadoop and Big Data Blog]]
-   * Get [[http://oreilly.com/catalog/9780596521998/index.html|"Hadoop: The 
Definitive Guide"]] (Tom White/O'Reilly)
  
   * [[http://www.contextweb.com/|Contextweb]] - Ad Exchange
    * We use Hadoop to store ad serving logs and use it as a source for ad 
optimizations, analytics, reporting and machine learning.
@@ -115, +102 @@

    * We've developed [[http://rdfgrid.rubyforge.org/|RDFgrid]], a Ruby 
framework for map/reduce-based processing of RDF data.
    * We primarily use Ruby, [[http://rdf.rubyforge.org/|RDF.rb]] and RDFgrid 
to process RDF data with Hadoop Streaming.
    * We primarily run Hadoop jobs on Amazon Elastic MapReduce, with cluster 
sizes of 1 to 20 nodes depending on the size of the dataset (hundreds of 
millions to billions of RDF statements).
- 
-  * [[http://www.datameer.com|Datameer]]
-   * Datameer Analytics Solution (DAS) is the first Hadoop-based solution for 
big data analytics that includes data source integration, storage, an analytics 
engine and visualization.
-   * DAS Log File Aggregator is a plug-in to DAS that makes it easy to import 
large numbers of log files stored on disparate servers.
  
   * [[http://www.deepdyve.com|Deepdyve]]
    * Elastic cluster with 5-80 nodes
@@ -234, +217 @@

  
   * [[http://www.ibm.com|IBM]]
    * [[http://www-03.ibm.com/press/us/en/pressrelease/22613.wss|Blue Cloud 
Computing Clusters]]
- 
    * [[http://www-03.ibm.com/press/us/en/pressrelease/22414.wss|University 
Initiative to Address Internet-Scale Computing Challenges]]
  
   * [[http://www.iccs.informatics.ed.ac.uk/|ICCS]]
@@ -268, +250 @@

    * Using Hadoop MapReduce to analyse billions of lines of GPS data to create 
TrafficSpeeds, our accurate traffic speed forecast product.
  
  = K =
- 
-  * [[http://www.karmasphere.com/|Karmasphere]]
-   * Distributes [[http://www.hadoopstudio.org/|Karmasphere Studio for 
Hadoop]], which allows cross-version development and management of Hadoop jobs 
in a familiar integrated development environment.
  
   * [[http://katta.wiki.sourceforge.net/|Katta]] - Katta serves large Lucene 
indexes in a grid environment.
    * Uses Hadoop FileSytem, RPC and IO
@@ -342, +321 @@

    * 18 node cluster (Quad-Core AMD Opteron 2347, 1TB/node storage)
    * Powers data for search and aggregation
  
-  * [[http://lucene.apache.org/mahout|Mahout]]
-   . Another Apache project using Hadoop to build scalable machine learning 
algorithms like canopy clustering, k-means and many more to come (naive bayes 
classifiers, others)
- 
   * [[http://metrixcloud.com/|MetrixCloud]] - provides commercial support, 
installation, and hosting of Hadoop Clusters. 
[[http://metrixcloud.com/contact.php|Contact Us.]]
  
  = N =
@@ -368, +344 @@

    * We rely on Apache Pig for reporting, analytics, Cascading for machine 
learning, and on a proprietary JavaScript API for ad-hoc queries
    * We use commodity hardware, with 8 cores and 16 GB of RAM per machine
  
-  * [[http://lucene.apache.org/nutch|Nutch]] - flexible web search engine 
software
- 
  = O =
  
  = P =
  
   * [[http://parc.com|PARC]] - Used Hadoop to analyze Wikipedia conflicts 
[[http://asc.parc.googlepages.com/2007-10-28-VAST2007-RevertGraph-Wiki.pdf|paper]].
  
-  * [[http://pentaho.com|Pentaho]] – Open Source Business Intelligence
-   * Pentaho provides the only complete, end-to-end open  source BI 
alternative to proprietary offerings like Oracle, SAP and  IBM
-   * We provide an easy-to-use, graphical ETL tool that  is integrated with 
Hadoop for managing data and coordinating Hadoop related  tasks in the broader 
context of your ETL and Business Intelligence  workflow
-   * We also provide Reporting and Analysis capabilities  against big data in 
Hadoop
-   * Learn more at 
[[http://www.pentaho.com/hadoop/|http://www.pentaho.com/hadoop]]
  
   * [[http://pharm2phork.org|Pharm2Phork Project]] - Agricultural Traceability
    * Using Hadoop on EC2 to process observation messages generated by 
RFID/Barcode readers as items move through supply chain.

[Hadoop Wiki] Update of "PoweredBy" by SomeOtherAccount

Reply via email to