[Hadoop Wiki] Update of "PoweredBy" by YannickMorel

Apache Wiki Mon, 22 Nov 2010 08:30:38 -0800

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.


The "PoweredBy" page has been changed by YannickMorel.
http://wiki.apache.org/hadoop/PoweredBy?action=diff&rev1=235&rev2=236

--------------------------------------------------

- This page documents an alphabetical list of institutions that are using 
Hadoop for educational or production uses.  Companies that offer services on or 
based around Hadoop are listed in [[Distributions and Commercial 
Support|Distributions and Commercial Support]] .
+ This page documents an alphabetical list of institutions that are using 
Hadoop for educational or production uses.  Companies that offer services on or 
based around Hadoop are listed in [[Distributions and Commercial Support]] .
  
  <<TableOfContents(3)>>
  
  = A =
- 
   * [[http://a9.com/|A9.com]] - Amazon *
    * We build Amazon's product search indices using the streaming API and 
pre-existing C++, Perl, and Python tools.
    * We process millions of sessions daily for analytics, using both the Java 
and streaming APIs.
@@ -38, +37 @@

    * A 15-node cluster dedicated to processing sorts of business data dumped 
out of database and joining them together. These data will then be fed into 
iSearch, our vertical search engine.
    * Each node has 8 cores, 16G RAM and 1.4T storage.
  
- 
   * [[http://aol.com/|AOL]]
-   * We use hadoop for variety of things ranging from ETL style processing and 
statistics generation to running advanced algorithms for doing behavioral 
analysis and targeting. 
+   * We use hadoop for variety of things ranging from ETL style processing and 
statistics generation to running advanced algorithms for doing behavioral 
analysis and targeting.
    * The Cluster that we use for mainly behavioral analysis and targeting has 
150 machines, Intel Xeon, dual processors, dual core, each with 16GB Ram and 
800 GB hard-disk.
  
   * [[http://atbrox.com/|Atbrox]]
@@ -48, +46 @@

    * Cluster: we primarily use Amazon's Elastic Mapreduce
  
  = B =
- 
   * [[http://www.babacar.org/|BabaCar]]
    * 4 nodes cluster (32 cores, 1TB).
    * We use Hadoop for searching and analysis of millions of rental bookings.
@@ -81, +78 @@

    * And use analyzing.
  
  = C =
- 
   * [[http://www.contextweb.com/|Contextweb]] - Ad Exchange
    * We use Hadoop to store ad serving logs and use it as a source for ad 
optimizations, analytics, reporting and machine learning.
    * Currently we have a 50 machine cluster with 400 cores and about 140TB raw 
storage. Each (commodity) node has 8 cores and 16GB of RAM.
@@ -98, +94 @@

    * [[http://www.springerlink.com/content/np5u8k1x9l6u755g|HDFS as a VM 
repository for virtual clusters]]
  
  = D =
- 
   * [[http://datagraph.org/|Datagraph]]
    * We use Hadoop for batch-processing large [[http://www.w3.org/RDF/|RDF]] 
datasets, in particular for indexing RDF data.
    * We also use Hadoop for executing long-running offline 
[[http://en.wikipedia.org/wiki/SPARQL|SPARQL]] queries for clients.
@@ -121, +116 @@

    * Eliminates the need for explicit data and schema mappings during database 
integration
  
  = E =
- 
   * [[http://www.ebay.com|EBay]]
    * 532 nodes cluster (8 * 532 cores, 5.3PB).
    * Heavy usage of Java MapReduce, Pig, Hive, HBase
@@ -147, +141 @@

    * Image based video copyright protection.
  
  = F =
- 
   * [[http://www.facebook.com/|Facebook]]
    * We use Hadoop to store copies of internal log and dimension data sources 
and use it as a source for reporting/analytics and machine learning.
    * Currently we have 2 major clusters:
@@ -177, +170 @@

    * We also uses Hadoop to analyzing similarities of user's behavior.
  
  = G =
- 
   * [[http://www.google.com|Google]]
    * 
[[http://www.google.com/intl/en/press/pressrel/20071008_ibm_univ.html|University
 Initiative to Address Internet-Scale Computing Challenges]]
  
@@ -194, +186 @@

    * Image and advertising analytics
  
  = H =
- 
   * [[http://www.hadoop.co.kr/|Hadoop Korean User Group]], a Korean Local 
Community Team Page.
    * 50 node cluster In the Korea university network environment.
     * Pentium 4 PC, HDFS 4TB Storage
@@ -218, +209 @@

    * What we crawl are our clients Websites and from the information we 
gather. We fingerprint old and non updated software packages in that shared 
hosting environment. We can then inform our clients that they have old and non 
updated software running after matching a signature to a Database. With that 
information we know which sites would require patching as a free and courtesy 
service to protect the majority of users. Without the technologies of Nutch and 
Hadoop this would be a far harder to accomplish task.
  
  = I =
- 
   * [[http://www.ibm.com|IBM]]
    * [[http://www-03.ibm.com/press/us/en/pressrelease/22613.wss|Blue Cloud 
Computing Clusters]]
    * [[http://www-03.ibm.com/press/us/en/pressrelease/22414.wss|University 
Initiative to Address Internet-Scale Computing Challenges]]
@@ -246, +236 @@

    * using 10 node hdfs cluster to store and process retrieved data.
  
  = J =
- 
   * [[http://joost.com|Joost]]
    * Session analysis and report generation
  
@@ -254, +243 @@

    * Using Hadoop MapReduce to analyse billions of lines of GPS data to create 
TrafficSpeeds, our accurate traffic speed forecast product.
  
  = K =
- 
   * [[http://katta.wiki.sourceforge.net/|Katta]] - Katta serves large Lucene 
indexes in a grid environment.
    * Uses Hadoop FileSytem, RPC and IO
  
@@ -265, +253 @@

    * Source code search engine uses Hadoop and Nutch.
  
  = L =
- 
   * [[http://www.last.fm|Last.fm]]
    * 44 nodes
    * Dual quad-core Xeon L5520 (Nehalem) @ 2.27GHz, 16GB RAM, 4TB/node storage.
    * Used for charts calculation, log analysis, A/B testing
  
   * [[http://www.legolas-media.com|Legolas Media]]
-   * 20 dual quad-core nodes, 32GB RAM , 5x1TB 
+   * 20 dual quad-core nodes, 32GB RAM , 5x1TB
    * Used for user profile analysis, statistical analysis,cookie level 
reporting tools.
-   * Some Hive but mainly automated Java MapReduce jobs that process ~150MM 
new events/day. 
+   * Some Hive but mainly automated Java MapReduce jobs that process ~150MM 
new events/day.
  
   * [[https://lbg.unc.edu|Lineberger Comprehensive Cancer Center - 
Bioinformatics Group]] This is the cancer center at UNC Chapel Hill. We are 
using Hadoop/HBase for databasing and analyzing Next Generation Sequencing 
(NGS) data produced for the [[http://cancergenome.nih.gov/|Cancer Genome 
Atlas]] (TCGA) project and other groups. This development is based on the 
[[http://seqware.sf.net|SeqWare]] open source project which includes SeqWare 
Query Engine, a database and web service built on top of HBase that stores 
sequence data types. Our prototype cluster includes:
    * 8 dual quad core nodes running CentOS
@@ -283, +270 @@

  
   * [[http://www.linkedin.com|LinkedIn]]
    * We have multiple grids divided up based upon purpose.  They are composed 
of the following types of hardware:
-     * 100 Nehalem-based nodes, with 2x4 cores, 24GB RAM, 8x1TB storage using 
ZFS in a JBOD configuration on Solaris.
+    * 100 Nehalem-based nodes, with 2x4 cores, 24GB RAM, 8x1TB storage using 
ZFS in a JBOD configuration on Solaris.
-     * 120 Westmere-based nodes, with 2x4 cores, 24GB RAM, 6x2TB storage using 
ext4 in a JBOD configuration on CentOS 5.5
+    * 120 Westmere-based nodes, with 2x4 cores, 24GB RAM, 6x2TB storage using 
ext4 in a JBOD configuration on CentOS 5.5
    * We use Hadoop and Pig for discovering People You May Know and other fun 
facts.
  
   * [[http://www.lookery.com|Lookery]]
@@ -295, +282 @@

    * Using Hadoop and Hbase for storage, log analysis, and pattern 
discovery/analysis.
  
  = M =
- 
   * [[http://www.markt24.de/|Markt24]]
    * We use Hadoop to filter user behaviour, recommendations and trends from 
externals sites
    * Using zkpython
@@ -333, +319 @@

   * [[http://metrixcloud.com/|MetrixCloud]] - provides commercial support, 
installation, and hosting of Hadoop Clusters. 
[[http://metrixcloud.com/contact.php|Contact Us.]]
  
  = N =
- 
   * [[http://www.openneptune.com|Neptune]]
    * Another Bigtable cloning project using Hadoop to store large structured 
data set.
    * 200 nodes(each node has: 2 dual core CPUs, 2TB storage, 4GB RAM)
@@ -354, +339 @@

    * We use commodity hardware, with 8 cores and 16 GB of RAM per machine
  
  = O =
- 
  = P =
- 
   * [[http://parc.com|PARC]] - Used Hadoop to analyze Wikipedia conflicts 
[[http://asc.parc.googlepages.com/2007-10-28-VAST2007-RevertGraph-Wiki.pdf|paper]].
- 
  
   * [[http://pharm2phork.org|Pharm2Phork Project]] - Agricultural Traceability
    * Using Hadoop on EC2 to process observation messages generated by 
RFID/Barcode readers as items move through supply chain.
@@ -383, +365 @@

    * Our cluster size varies from 5 to 10 nodes. Cluster nodes vary from 2950 
Quad Core Rack Server, with 2x6MB Cache and 4 x 500 GB SATA Hard Drive to E7200 
/ E7400 processors with 4 GB RAM and 160 GB HDD.
  
  = Q =
- 
   * [[http://www.quantcast.com/|Quantcast]]
    * 3000 cores, 3500TB. 1PB+ processing each day.
    * Hadoop scheduler with fully custom data path / sorter
    * Significant contributions to KFS filesystem
  
  = R =
- 
   * [[http://www.rackspace.com/email_hosting/|Rackspace]]
    * 30 node cluster (Dual-Core, 4-8GB RAM, 1.5TB/node storage)
     * Parses and indexes logs from email hosting system for search: 
http://blog.racklabs.com/?p=66
@@ -409, +389 @@

    * We intend to parallelize some traditional classification, clustering 
algorithms like Naive Bayes, K-Means, EM so that can deal with large-scale data 
sets.
  
  = S =
- 
   * 
[[http://www.sara.nl/news/recent/20101103/Hadoop_proof-of-concept.html|SARA, 
Netherlands]]
    * SARA has initiated a Proof-of-Concept project to evaluate the Hadoop 
software stack for scientific use.
  
@@ -444, +423 @@

    * Hosted Hadoop data warehouse solution provider
  
  = T =
- 
   * [[http://www.taragana.com|Taragana]] - Web 2.0 Product development and 
outsourcing services
    * We are using 16 consumer grade computers to create the cluster, connected 
by 100 Mbps network.
    * Used for testing ideas for blog and other data mining.
@@ -480, +458 @@

    * We have 94 nodes (752 cores) in our clusters, as of July 2010, but the 
number grows regularly.
  
  = U =
- 
   * [[http://glud.udistrital.edu.co|Universidad Distrital Francisco Jose de 
Caldas (Grupo GICOGE/Grupo Linux UD GLUD/Grupo GIGA]]
-   5 node low-profile cluster. We use Hadoop to support the research project: 
Territorial Intelligence System of Bogota City.
+   . 5 node low-profile cluster. We use Hadoop to support the research 
project: Territorial Intelligence System of Bogota City.
  
   * [[http://ir.dcs.gla.ac.uk/terrier/|University of Glasgow - Terrier Team]]
    * 30 nodes cluster (Xeon Quad Core 2.4GHz, 4GB RAM, 1TB/node storage).
@@ -495, +472 @@

    . We currently run one medium-sized Hadoop cluster (200TB) to store and 
serve up physics data for the computing portion of the Compact Muon Solenoid 
(CMS) experiment. This requires a filesystem which can download data at 
multiple Gbps and process data at an even higher rate locally. Additionally, 
several of our students are involved in research projects on Hadoop.
  
  = V =
- 
   * [[http://www.veoh.com|Veoh]]
    * We use a small Hadoop cluster to reduce usage data for internal metrics, 
for search indexing and for recommendation data.
  
@@ -506, +482 @@

    * We also use Hadoop for filtering and indexing listing, processing log 
analysis, and for recommendation data.
  
  = W =
- 
+  * [[http://www.web-alliance.fr|Web Alliance]]
+   * We use Hadoop for our internal search engine optimization (SEO) tools. It 
allows us to store, index, search data in a much faster way.
+   * We also use it for logs analysis and trends prediction.
   * [[http://www.worldlingo.com/|WorldLingo]]
    * Hardware: 44 servers (each server has: 2 dual core CPUs, 2TB storage, 8GB 
RAM)
    * Each server runs Xen with one Hadoop/HBase instance and another instance 
with web or application servers, giving us 88 usable virtual machines.
@@ -516, +494 @@

    * Currently we store 12million documents with a target of 450million in the 
near future.
  
  = X =
- 
  = Y =
- 
   * [[http://www.yahoo.com/|Yahoo!]]
    * More than 100,000 CPUs in >36,000 computers running Hadoop
    * Our biggest cluster: 4000 nodes (2*4cpu boxes w 4*1TB disk & 16GB RAM)
@@ -528, +504 @@

    * >60% of Hadoop Jobs within Yahoo are Pig jobs.
  
  = Z =
- 
   * [[http://www.zvents.com/|Zvents]]
    * 10 node cluster (Dual-Core AMD Opteron 2210, 4GB RAM, 1TB/node storage)
    * Run Naive Bayes classifiers in parallel over crawl data to discover event 
information

[Hadoop Wiki] Update of "PoweredBy" by YannickMorel

Reply via email to