[Hadoop Wiki] Update of "PoweredBy" by BradfordStephens

Apache Wiki Tue, 17 Mar 2009 17:39:57 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.


The following page has been changed by BradfordStephens:
http://wiki.apache.org/hadoop/PoweredBy

The comment on the change is:
Added information about Visible Technologies and Hadoop

------------------------------------------------------------------------------
    * A 15-node cluster dedicated to processing sorts of business data dumped 
out of database and joining them together. These data will then be fed into 
iSearch, our vertical search engine.
    * Each node has 8 cores, 16G RAM and 1.4T storage.
  
-  * [http://aol.com/ AOL] 
+  * [http://aol.com/ AOL]
    * We use hadoop for variety of things ranging from ETL style processing and 
statistics generation to running advanced algorithms for doing behavioral 
analysis and targeting.
    * Our cluster size is 50 machines, Intel Xeon, dual processors, dual core, 
each with 16GB Ram and 800 GB hard-disk giving us a total of 37 TB HDFS 
capacity.
  
@@ -44, +44 @@

    * We're writing [http://oreilly.com/catalog/9780596521998/index.html 
"Hadoop: The Definitive Guide"] (Tom White/O'Reilly)
  
  
-  * [http://www.contextweb.com/ Contextweb] - ADSDAQ Ad Excange 
+  * [http://www.contextweb.com/ Contextweb] - ADSDAQ Ad Excange
-   * We use Hadoop to store ad serving log and use it as a source for Ad 
optimizations/Analytics/reporting/machine learning. 
+   * We use Hadoop to store ad serving log and use it as a source for Ad 
optimizations/Analytics/reporting/machine learning.
    * Currently we have a 23 machine cluster with 184 cores and about 35TB raw 
storage.  Each (commodity) node has 8 cores, 8GB RAM and 1.7 TB of storage.
  
   * [http://www.weblab.infosci.cornell.edu/ Cornell University Web Lab]
@@ -65, +65 @@

    * Image content based advertising and auto-tagging for social media.
    * Image based video copyright protection.
  
-  * [http://www.facebook.com/ Facebook] 
+  * [http://www.facebook.com/ Facebook]
-   * We use Hadoop to store copies of internal log and dimension data sources 
and use it as a source for reporting/analytics and machine learning. 
+   * We use Hadoop to store copies of internal log and dimension data sources 
and use it as a source for reporting/analytics and machine learning.
    * Currently have a 600 machine cluster with 4800 cores and about 2 PB raw 
storage.  Each (commodity) node has 8 cores and 4 TB of storage.
    * We are heavy users of both streaming as well as the Java apis. We have 
built a higher level data warehousing framework using these features called 
Hive (see the [http://hadoop.apache.org/hive/]).  We have also developed a FUSE 
implementation over hdfs.
  
@@ -76, +76 @@

    * Use for log analysis, data mining and machine learning
  
   * [http://www.hadoop.co.kr/ Hadoop Korean User Group], a Korean Local 
Community Team Page.
-   * 50 node cluster In the Korea university network environment. 
+   * 50 node cluster In the Korea university network environment.
     * Pentium 4 PC, HDFS 4TB Storage
    * Used for development projects
     * Retrieving and Analyzing Biomedical Knowledge
@@ -103, +103 @@

    Hadoop is also beginning to be used in our teaching and general research
    activities on natural language processing and machine learning.
  
-  * [http://search.iiit.ac.in/ IIIT, Hyderabad] 
+  * [http://search.iiit.ac.in/ IIIT, Hyderabad]
    * We use hadoop for Information Retrieval and Extraction research projects. 
Also working on map-reduce scheduling research for multi-job environments.
    * Our cluster sizes vary from 10 to 30 nodes, depending on the jobs. 
Heterogenous nodes with most being Quad 6600s, 4GB RAM and 1TB disk per node. 
Also some nodes with dual core and single core configurations.
  
   * [http://www.imageshack.us/ ImageShack]
    * From 
[http://www.techcrunch.com/2008/05/20/update-imageshack-ceo-hints-at-his-grander-ambitions/
 TechCrunch]:
-     Rather than put ads in or around the images it hosts, Levin is working on 
harnessing all the data his 
+     Rather than put ads in or around the images it hosts, Levin is working on 
harnessing all the data his
-     service generates about content consumption (perhaps to better target 
advertising on ImageShack or to 
+     service generates about content consumption (perhaps to better target 
advertising on ImageShack or to
      syndicate that targetting data to ad networks). Like Google and Yahoo, he 
is deploying the open-source
      Hadoop software to create a massive distributed supercomputer, but he is 
using it to analyze all the
      data he is collecting.
@@ -125, +125 @@

    * Session analysis and report generation
  
   * [http://katta.wiki.sourceforge.net/ Katta] - Katta serves large Lucene 
indexes in a grid environment.
-    * Uses Hadoop FileSytem, RPC and IO 
+    * Uses Hadoop FileSytem, RPC and IO
  
   * [http://www.koubei.com/ Koubei.com ] Large local community and local 
search at China.
     Using Hadoop to process apache log, analyzing user's action and click flow 
and the links click with any specified page in site and more.  Using Hadoop to 
process whole price data user input with map/reduce.
  
   * [http://krugle.com/ Krugle]
-   * Source code search engine uses Hadoop and Nutch. 
+   * Source code search engine uses Hadoop and Nutch.
  
   * [http://www.last.fm Last.fm]
    * 50 nodes (dual xeon LV 2GHz, 4GB RAM, 1TB/node storage and dual xeon 
L5320 1.86GHz, 8GB RAM, 3TB/node storage).
@@ -155, +155 @@

     * Another Bigtable cloning project using Hadoop to store large structured 
data set.
     * 200 nodes(each node has: 2 dual core CPUs, 2TB storage, 4GB RAM)
  
-  * [http://www.netseer.com NetSeer] - 
+  * [http://www.netseer.com NetSeer] -
    * Up to 1000 instances on 
[http://www.amazon.com/b/ref=sc_fe_l_2/002-1156069-5604805?ie=UTF8&node=201590011&no=3435361&me=A36L942TSJ2AJA
 Amazon EC2]
    * Data storage in 
[http://www.amazon.com/S3-AWS-home-page-Money/b/ref=sc_fe_l_2/002-1156069-5604805?ie=UTF8&node=16427261&no=3435361&me=A36L942TSJ2AJA
 Amazon S3]
    * 50 node cluster in Coloc
@@ -163, +163 @@

  
   * [http://nytimes.com The New York Times]
    * 
[http://open.blogs.nytimes.com/2007/11/01/self-service-prorated-super-computing-fun/
 Large scale image conversions]
-   * Used EC2 to run hadoop on a large virtual cluster 
+   * Used EC2 to run hadoop on a large virtual cluster
  
   * [http://www.ning.com Ning]
    * We use Hadoop to store and process our log files
@@ -224, +224 @@

     We are one of six universities participating in IBM/Google's academic
     cloud computing initiative.  Ongoing research and teaching efforts
     include projects in machine translation, language modeling,
-    bioinformatics, email analysis, and image processing. 
+    bioinformatics, email analysis, and image processing.
  
   * [http://t2.unl.edu University of Nebraska Lincoln, Research Computing 
Facility]
     We currently run one medium-sized Hadoop cluster (200TB) to store and 
serve up physics data
@@ -233, +233 @@

     several of our students are involved in research projects on Hadoop.
  
   * [http://www.veoh.com Veoh]
-   * We use a small Hadoop cluster to reduce usage data for internal metrics, 
for search indexing and for recommendation data. 
+   * We use a small Hadoop cluster to reduce usage data for internal metrics, 
for search indexing and for recommendation data.
  
   * [http://www.visiblemeasures.com Visible Measures Corporation] uses Hadoop 
as a component in our Scalable Data Pipeline, which ultimately powers 
!VisibleSuite and other products.  We use Hadoop to aggregate, store, and 
analyze data related to in-stream viewing behavior of Internet video audiences. 
  Our current grid contains more than 128 CPU cores and in excess of 100 
terabytes of storage, and we plan to grow that substantially during 2008.
+ 
+  * [http://www.visibletechnologies.com Visible Technologies] Hadoop is 
quickly becoming the core of our business. We use it to extract Business 
Intelligence out of Consumer Generated Media.
+     *Running on over 150 servers through 2009
+     *Use Nutch to crawl and index HTML pages, Lucene and HBase to store 
documents, Solr to search, Zookeeper to manage search shards, and possibly 
Mahout for semantic Machine Learning
+     *Many BI-related tasks run on Hadoop to extract meaningful data (topics, 
authors, keywords, link graphs, etc)
+ 
  
   * [http://www.vksolutions.com/ VK Solutions]
    * We use a small Hadoop cluster in the scope of our general research 
activities at [http://www.vklabs.com VK Labs] to get a faster data access from 
web applications.
-   * We also use Hadoop for filtering and indexing listing, processing log 
analysis, and for recommendation data.  
+   * We also use Hadoop for filtering and indexing listing, processing log 
analysis, and for recommendation data.
  
   * [http://www.worldlingo.com/ WorldLingo]
    * Hardware: 44 servers (each server has: 2 dual core CPUs, 2TB storage, 8GB 
RAM)
    * Each server runs Xen with one Hadoop/HBase instance and another instance 
with web or application servers, giving us 88 usable virtual machines.
    * We run two separate Hadoop/HBase clusters with 22 nodes each.
    * Hadoop is primarily used to run HBase and Map/Reduce jobs scanning over 
the HBase tables to perform specific tasks.
-   * HBase is used as a scalable and fast storage back end for millions of 
documents. 
+   * HBase is used as a scalable and fast storage back end for millions of 
documents.
    * Currently we store 12million documents with a target of 450million in the 
near future.
  
   * [http://www.yahoo.com/ Yahoo!]

[Hadoop Wiki] Update of "PoweredBy" by BradfordStephens

Reply via email to