[Hadoop Wiki] Update of "PoweredBy" by vuelos

Apache Wiki Sat, 17 Oct 2009 16:20:02 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.


The "PoweredBy" page has been changed by vuelos.
http://wiki.apache.org/hadoop/PoweredBy?action=diff&rev1=158&rev2=159

--------------------------------------------------

  Applications and organizations using Hadoop include (alphabetically):
+ 
   * [[http://a9.com/|A9.com]] - Amazon
    * We build Amazon's product search indices using the streaming API and 
pre-existing C++, Perl, and Python tools.
    * We process millions of sessions daily for analytics, using both the Java 
and streaming APIs.
    * Our clusters vary from 1 to 100 nodes.
  
   * [[http://www.adobe.com|Adobe]]
-   * We use Hadoop and HBase in several areas from social services to 
structured data storage and processing for internal use. 
+   * We use Hadoop and HBase in several areas from social services to 
structured data storage and processing for internal use.
    * We currently have about 30 nodes running HDFS, Hadoop and HBase  in 
clusters ranging from 5 to 14 nodes on both production and development. We plan 
a deployment on an 80 nodes cluster.
    * We constantly write data to HBase and run MapReduce jobs to process then 
store it back to HBase or external systems.
    * Our production cluster has been running since Oct 2008.
@@ -65, +66 @@

    * Generating web graphs on 100 nodes (dual 2.4GHz Xeon Processor, 2 GB RAM, 
72GB Hard Drive)
  
   * [[http://www.deepdyve.com|Deepdyve]]
-   * Elastic cluster with 5-80 nodes 
+   * Elastic cluster with 5-80 nodes
    * We use hadoop to create our indexes of deep web content and to provide a 
high availability and high bandwidth storage service for index shards for our 
search cluster.
  
   * [[http://search.detik.com|Detikcom]] - Indonesia's largest news portal
@@ -99, +100 @@

   * [[http://www.facebook.com/|Facebook]]
    * We use Hadoop to store copies of internal log and dimension data sources 
and use it as a source for reporting/analytics and machine learning.
    * Currently have a 600 machine cluster with 4800 cores and about 2 PB raw 
storage.  Each (commodity) node has 8 cores and 4 TB of storage.
-   * We are heavy users of both streaming as well as the Java apis. We have 
built a higher level data warehousing framework using these features called 
Hive (see the [[http://hadoop.apache.org/hive/]]).  We have also developed a 
FUSE implementation over hdfs.
+   * We are heavy users of both streaming as well as the Java apis. We have 
built a higher level data warehousing framework using these features called 
Hive (see the http://hadoop.apache.org/hive/).  We have also developed a FUSE 
implementation over hdfs.
  
   * [[http://www.foxaudiencenetwork.com|FOX Audience Network]]
    * 40 machine cluster (8 cores/machine, 2TB/machine storage)
@@ -135, +136 @@

  
   * [[http://www.hadoop.tw/|Hadoop Taiwan User Group]]
  
-  * [[http://holaservers.com/|HolaServers.com]]
-   * Hosting company
-   * Use pig to provide traffic stats to users in near real time
+  * [[http://net-ngo.com|Hipotecas y euribor]]
+   * Evolución del euribor y valor actual
+   * Simulador de hipotecas en crisis económica
  
   * [[http://www.hostinghabitat.com/|Hosting Habitat]]
-   * We use a customised version of Hadoop and Nutch in a currently 
experimental 6 node/Dual Core cluster environment. 
+   * We use a customised version of Hadoop and Nutch in a currently 
experimental 6 node/Dual Core cluster environment.
-   * What we crawl are our clients Websites and from the information we 
gather. We fingerprint old and non updated software packages in that shared 
hosting environment. We can then inform our clients that they have old and non 
updated software running after matching a signature to a Database. With that 
information we know which sites would require patching as a free and courtesy 
service to protect the majority of users. Without the technologies of Nutch and 
Hadoop this would be a far harder to accomplish task. 
+   * What we crawl are our clients Websites and from the information we 
gather. We fingerprint old and non updated software packages in that shared 
hosting environment. We can then inform our clients that they have old and non 
updated software running after matching a signature to a Database. With that 
information we know which sites would require patching as a free and courtesy 
service to protect the majority of users. Without the technologies of Nutch and 
Hadoop this would be a far harder to accomplish task.
  
   * [[http://www.ibm.com|IBM]]
    * [[http://www-03.ibm.com/press/us/en/pressrelease/22613.wss|Blue Cloud 
Computing Clusters]]
    * [[http://www-03.ibm.com/press/us/en/pressrelease/22414.wss|University 
Initiative to Address Internet-Scale Computing Challenges]]
  
   * [[http://www.iccs.informatics.ed.ac.uk/|ICCS]]
+   * We are using Hadoop and Nutch to crawl Blog posts and later process them. 
Hadoop is also beginning to be used in our teaching and general research 
activities on natural language processing and machine learning.
-   * We are using Hadoop and Nutch to crawl Blog posts and later process them.
-   Hadoop is also beginning to be used in our teaching and general research
-   activities on natural language processing and machine learning.
  
   * [[http://search.iiit.ac.in/|IIIT, Hyderabad]]
    * We use hadoop for Information Retrieval and Extraction research projects. 
Also working on map-reduce scheduling research for multi-job environments.
@@ -158, +157 @@

  
   * [[http://www.imageshack.us/|ImageShack]]
    * From 
[[http://www.techcrunch.com/2008/05/20/update-imageshack-ceo-hints-at-his-grander-ambitions/|TechCrunch]]:
-     Rather than put ads in or around the images it hosts, Levin is working on 
harnessing all the data his
+    . Rather than put ads in or around the images it hosts, Levin is working 
on harnessing all the data his
+    service generates about content consumption (perhaps to better target 
advertising on ImageShack or to syndicate that targetting data to ad networks). 
Like Google and Yahoo, he is deploying the open-source Hadoop software to 
create a massive distributed supercomputer, but he is using it to analyze all 
the data he is collecting.
-     service generates about content consumption (perhaps to better target 
advertising on ImageShack or to
-     syndicate that targetting data to ad networks). Like Google and Yahoo, he 
is deploying the open-source
-     Hadoop software to create a massive distributed supercomputer, but he is 
using it to analyze all the
-     data he is collecting.
  
   * [[http://www.isi.edu/|Information Sciences Institute (ISI)]]
    * Used Hadoop and 18 nodes/52 cores to 
[[http://www.isi.edu/ant/address/whole_internet/|plot the entire internet]].
@@ -174, +170 @@

    * Session analysis and report generation
  
   * [[http://www.journeydynamics.com|Journey Dynamics]]
-   * Using Hadoop MapReduce to analyse billions of lines of GPS data to create 
Traffic``Speeds, our accurate traffic speed forecast product.
+   * Using Hadoop MapReduce to analyse billions of lines of GPS data to create 
TrafficSpeeds, our accurate traffic speed forecast product.
  
   * [[http://www.karmasphere.com/|Karmasphere]]
    * Distributes [[http://www.hadoopstudio.org/|Karmasphere Studio for 
Hadoop]], which allows cross-version development and management of Hadoop jobs 
in a familiar integrated development environment.
  
   * [[http://katta.wiki.sourceforge.net/|Katta]] - Katta serves large Lucene 
indexes in a grid environment.
-    * Uses Hadoop FileSytem, RPC and IO
+   * Uses Hadoop FileSytem, RPC and IO
  
-  * [[http://www.koubei.com/|Koubei.com ]] Large local community and local 
search at China.
+  * [[http://www.koubei.com/|Koubei.com]] Large local community and local 
search at China.
-    Using Hadoop to process apache log, analyzing user's action and click flow 
and the links click with any specified page in site and more.  Using Hadoop to 
process whole price data user input with map/reduce.
+   . Using Hadoop to process apache log, analyzing user's action and click 
flow and the links click with any specified page in site and more.  Using 
Hadoop to process whole price data user input with map/reduce.
  
   * [[http://krugle.com/|Krugle]]
    * Source code search engine uses Hadoop and Nutch.
@@ -198, +194 @@

    * Our cluster runs across Amazon's EC2 webservice and makes use of the 
streaming module to use Python for most operations.
  
   * [[http://www.lotame.com|Lotame]]
-    * Using Hadoop and Hbase for storage, log analysis, and pattern 
discovery/analysis.
+   * Using Hadoop and Hbase for storage, log analysis, and pattern 
discovery/analysis.
  
   * [[http://www.mylife.com/|MyLife]]
    * 18 node cluster (Quad-Core AMD Opteron 2347, 1TB/node storage)
    * Powers data for search and aggregation
  
   * [[http://lucene.apache.org/mahout|Mahout]]
-    Another Apache project using Hadoop to build scalable machine learning   
algorithms like canopy clustering, k-means and many more to come (naive bayes 
classifiers, others)
+   . Another Apache project using Hadoop to build scalable machine learning   
algorithms like canopy clustering, k-means and many more to come (naive bayes 
classifiers, others)
  
   * [[http://metrixcloud.com/|MetrixCloud]] - provides commercial support, 
installation, and hosting of Hadoop Clusters. 
[[http://metrixcloud.com/contact.php|Contact Us.]]
  
   * [[http://www.openneptune.com|Neptune]]
-    * Another Bigtable cloning project using Hadoop to store large structured 
data set.
+   * Another Bigtable cloning project using Hadoop to store large structured 
data set.
-    * 200 nodes(each node has: 2 dual core CPUs, 2TB storage, 4GB RAM)
+   * 200 nodes(each node has: 2 dual core CPUs, 2TB storage, 4GB RAM)
  
   * [[http://www.netseer.com|NetSeer]] -
    * Up to 1000 instances on 
[[http://www.amazon.com/b/ref=sc_fe_l_2/002-1156069-5604805?ie=UTF8&node=201590011&no=3435361&me=A36L942TSJ2AJA|Amazon
 EC2]]
@@ -242, +238 @@

    * Using HDFS for large archival data storage
  
   * [[http://www.psgtech.edu/|PSG Tech, Coimbatore, India]]
-   * Multiple alignment of protein sequences helps to determine evolutionary 
linkages and to predict molecular structures. The dynamic nature of the 
algorithm coupled with data and compute parallelism of hadoop data grids 
improves the accuracy and speed of sequence alignment. Parallelism at the 
sequence and block level reduces the time complexity of MSA problems. Scalable 
nature of Hadoop makes it apt to solve large scale alignment problems. 
+   * Multiple alignment of protein sequences helps to determine evolutionary 
linkages and to predict molecular structures. The dynamic nature of the 
algorithm coupled with data and compute parallelism of hadoop data grids 
improves the accuracy and speed of sequence alignment. Parallelism at the 
sequence and block level reduces the time complexity of MSA problems. Scalable 
nature of Hadoop makes it apt to solve large scale alignment problems.
    * Our cluster size varies from 5 to 10 nodes. Cluster nodes vary from 2950 
Quad Core  Rack Server,  with 2x6MB Cache and 4 x 500 GB SATA Hard Drive to 
E7200 / E7400 processors with 4 GB RAM and 160 GB HDD.
-  
+ 
   * [[http://www.quantcast.com/|Quantcast]]
    * 3000 cores, 3500TB. 1PB+ processing each day.
    * Hadoop scheduler with fully custom data path / sorter
@@ -299, +295 @@

    We use Hadoop to facilitate information retrieval research & 
experimentation, particularly for TREC, using the Terrier IR platform. The open 
source release of [[http://ir.dcs.gla.ac.uk/terrier/|Terrier]] includes 
large-scale distributed indexing using Hadoop Map Reduce.
  
   * 
[[http://www.umiacs.umd.edu/~jimmylin/cloud-computing/index.html|University of 
Maryland]]
+   . We are one of six universities participating in IBM/Google's academic 
cloud computing initiative.  Ongoing research and teaching efforts include 
projects in machine translation, language modeling, bioinformatics, email 
analysis, and image processing.
-    We are one of six universities participating in IBM/Google's academic
-    cloud computing initiative.  Ongoing research and teaching efforts
-    include projects in machine translation, language modeling,
-    bioinformatics, email analysis, and image processing.
  
   * [[http://t2.unl.edu|University of Nebraska Lincoln, Research Computing 
Facility]]
+   . We currently run one medium-sized Hadoop cluster (200TB) to store and 
serve up physics data for the computing portion of the Compact Muon Solenoid 
(CMS) experiment.  This requires a filesystem which can download data at 
multiple Gbps and process data at an even higher rate locally.  Additionally, 
several of our students are involved in research projects on Hadoop.
-    We currently run one medium-sized Hadoop cluster (200TB) to store and 
serve up physics data
-    for the computing portion of the Compact Muon Solenoid (CMS) experiment.  
This requires a filesystem
-    which can download data at multiple Gbps and process data at an even 
higher rate locally.  Additionally,
-    several of our students are involved in research projects on Hadoop.
  
   * [[http://www.veoh.com|Veoh]]
    * We use a small Hadoop cluster to reduce usage data for internal metrics, 
for search indexing and for recommendation data.
@@ -318, +308 @@

   * [[http://www.vksolutions.com/|VK Solutions]]
    * We use a small Hadoop cluster in the scope of our general research 
activities at [[http://www.vklabs.com|VK Labs]] to get a faster data access 
from web applications.
    * We also use Hadoop for filtering and indexing listing, processing log 
analysis, and for recommendation data.
- 
  
   * [[http://devuelosbaratos.es/|Vuelos baratos]]
    * We use a small Hadoop
@@ -334, +323 @@

   * [[http://www.yahoo.com/|Yahoo!]]
    * More than 100,000 CPUs in >25,000 computers running Hadoop
    * Our biggest cluster: 4000 nodes (2*4cpu boxes w 4*1TB disk & 16GB RAM)
-      * Used to support research for Ad Systems and Web Search
+    * Used to support research for Ad Systems and Web Search
-      * Also used to do scaling tests to support development of Hadoop on 
larger clusters
+    * Also used to do scaling tests to support development of Hadoop on larger 
clusters
    * [[http://developer.yahoo.com/blogs/hadoop|Our Blog]] - Learn more about 
how we use Hadoop.
    * >40% of Hadoop Jobs within Yahoo are Pig jobs.
  
@@ -343, +332 @@

    * 10 node cluster (Dual-Core AMD Opteron 2210, 4GB RAM, 1TB/node storage)
    * Run Naive Bayes classifiers in parallel over crawl data to discover event 
information
  
- 
  ''When applicable, please include details about your cluster hardware and 
size.''

[Hadoop Wiki] Update of "PoweredBy" by vuelos

Reply via email to