[Hadoop Wiki] Update of "PoweredBy" by SteveLoughran

Apache Wiki Tue, 06 Dec 2011 03:05:03 -0800

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.


The "PoweredBy" page has been changed by SteveLoughran:
http://wiki.apache.org/hadoop/PoweredBy?action=diff&rev1=381&rev2=382

Comment:
rm some linkspam, review spelling and text

    * ''Our production cluster has been running since Oct 2008. ''
  
   * ''[[http://www.adyard.de|adyard]] ''
-   * ''We use Flume, Hadoop and Pig for log storage and report generation 
aswell as ad-Targeting. ''
+   * ''We use Flume, Hadoop and Pig for log storage and report generation as 
well as ad-Targeting. ''
    * ''We currently have 12 nodes running HDFS and Pig and plan to add more 
from time to time. ''
    * ''50% of our recommender system is pure Pig because of it's ease of use. 
''
-   * ''Some of our more deeply-integrated tasks are using the streaming api 
and ruby aswell as the excellent Wukong-Library. ''
+   * ''Some of our more deeply-integrated tasks are using the streaming API 
and ruby as well as the excellent Wukong-Library. ''
  
   * ''[[http://www.ablegrape.com/|Able Grape]] - Vertical search engine for 
trustworthy wine information ''
-   * ''We have one of the world's smaller hadoop clusters (2 nodes @ 8 
CPUs/node) ''
+   * ''We have one of the world's smaller Hadoop clusters (2 nodes @ 8 
CPUs/node) ''
    * ''Hadoop and Nutch used to analyze and index textual information ''
  
   * ''[[http://adknowledge.com/|Adknowledge]] - Ad network ''
@@ -49, +49 @@

    * ''Each node has 8 cores, 16G RAM and 1.4T storage. ''
  
   * ''[[http://aol.com/|AOL]] ''
-   * ''We use hadoop for variety of things ranging from ETL style processing 
and statistics generation to running advanced algorithms for doing behavioral 
analysis and targeting. ''
+   * ''We use Hadoop for variety of things ranging from ETL style processing 
and statistics generation to running advanced algorithms for doing behavioral 
analysis and targeting. ''
    * ''The Cluster that we use for mainly behavioral analysis and targeting 
has 150 machines, Intel Xeon, dual processors, dual core, each with 16GB Ram 
and 800 GB hard-disk. ''
  
   * ''[[http://www.ara.com.tr/|ARA.COM.TR]] - Ara Com Tr - Turkey's first and 
only search engine ''
@@ -59, +59 @@

    * ''Our clusters vary from 10 to 100 nodes ''
  
   * ''[[http://atbrox.com/|Atbrox]] ''
-   * ''We use hadoop for information extraction & search, and data analysis 
consulting ''
+   * ''We use Hadoop for information extraction & search, and data analysis 
consulting ''
    * ''Cluster: we primarily use Amazon's Elastic MapReduce ''
  
  = B =
@@ -75, +75 @@

  
   * ''[[http://www.beebler.com|Beebler]] ''
    * ''14 node cluster (each node has: 2 dual core CPUs, 2TB storage, 8GB RAM) 
''
-   * ''We use hadoop for matching dating profiles ''
+   * ''We use Hadoop for matching dating profiles ''
  
   * ''[[http://www.benipaltechnologies.com|Benipal Technologies]] - 
Outsourcing, Consulting, Innovation ''
    * ''35 Node Cluster (Core2Quad Q9400 Processor, 4-8 GB RAM, 500 GB HDD) ''
@@ -150, +150 @@

  
   * ''[[http://www.deepdyve.com|Deepdyve]] ''
    * ''Elastic cluster with 5-80 nodes ''
-   * ''We use hadoop to create our indexes of deep web content and to provide 
a high availability and high bandwidth storage service for index shards for our 
search cluster. ''
+   * ''We use Hadoop to create our indexes of deep web content and to provide 
a high availability and high bandwidth storage service for index shards for our 
search cluster. ''
  
   * ''[[http://www.wirtschaftsdetektei-berlin.de|Detektei Berlin]] ''
    * ''We are using Hadoop in our data mining and multimedia/internet research 
groups. ''
    * ''3 node cluster with 48 cores in total, 4GB RAM and 1 TB storage each. ''
  
   * ''[[http://search.detik.com|Detikcom]] - Indonesia's largest news portal ''
-   * ''We use hadoop, pig and hbase to analyze search log, generate Most View 
News, generate top wordcloud, and analyze all of our logs ''
+   * ''We use Hadoop, pig and HBase to analyze search log, generate Most View 
News, generate top wordcloud, and analyze all of our logs ''
    * ''Currently We use 9 nodes ''
  
   * ''[[http://www.devdaily.com|devdaily.com]] ''
@@ -209, +209 @@

    * ''Currently we have 2 major clusters:    * A 1100-machine cluster with 
8800 cores and about 12 PB raw storage. ''
     * ''A 300-machine cluster with 2400 cores and about 3 PB raw storage. ''
     * ''Each (commodity) node has 8 cores and 12 TB of storage. ''
-    * ''We are heavy users of both streaming as well as the Java apis. We have 
built a higher level data warehousing framework using these features called 
Hive (see the http://hadoop.apache.org/hive/). We have also developed a FUSE 
implementation over hdfs. ''
+    * ''We are heavy users of both streaming as well as the Java APIs. We have 
built a higher level data warehousing framework using these features called 
Hive (see the http://hadoop.apache.org/hive/). We have also developed a FUSE 
implementation over HDFS. ''
  
   * ''[[http://www.foxaudiencenetwork.com|FOX Audience Network]] ''
    * ''40 machine cluster (8 cores/machine, 2TB/machine storage) ''
@@ -227, +227 @@

    * ''Machine learning ''
  
   * ''[[http://freestylers.jp/|Freestylers]] - Image retrieval engine ''
-   * ''[[http://www.kralarabaoyunlari.com|Araba oyunları]] - Araba oyunları ''
-   * [[http://www.pepe-izle.gen.tr/|Pepe izle]] - Pepe izle
-   * [[http://www.scratchcardportal.com|scratch cards]] -Scratch Cards
-   * ''We Japanese company Freestylers use Hadoop to build the image 
processing environment for image-based product recommendation system mainly on 
Amazon EC2, from April 2009. ''
+   * ''We, the Japanese company Freestylers, use Hadoop to build the image 
processing environment for image-based product recommendation system mainly on 
Amazon EC2, from April 2009. ''
    * ''Our Hadoop environment produces the original database for fast access 
from our web application. ''
    * ''We also uses Hadoop to analyzing similarities of user's behavior. ''
  
@@ -260, +257 @@

  
  = H =
   * ''[[http://www.hadoop.co.kr/|Hadoop Korean User Group]], a Korean Local 
Community Team Page. ''
-   * ''50 node cluster In the Korea university network environment.    * 
Pentium 4 PC, HDFS 4TB Storage ''
+   * ''50 node cluster In the Korea university network environment.
+   * Pentium 4 PC, HDFS 4TB Storage ''
  
-  * ''Used for development projects    * Retrieving and Analyzing Biomedical 
Knowledge ''
+  * ''Used for development projects
+   * Retrieving and Analyzing Biomedical Knowledge ''
    * ''Latent Semantic Analysis, Collaborative Filtering ''
  
   * ''[[http://www.hotelsandaccommodation.com.au/|Hotels & Accommodation]] ''
@@ -373, +372 @@

     * ''120 Nehalem-based Sun x4275, with 2x4 cores, 24GB RAM, 8x1TB SATA ''
     * ''580 Westmere-based HP SL 170x, with 2x4 cores, 24GB RAM, 6x2TB SATA ''
     * ''1200 Westmere-based SuperMicro X8DTT-H, with 2x6 cores, 24GB RAM, 
6x2TB SATA ''
+    * ''Software:
-    * ''Software:     * CentOS 5.5 -> RHEL 6.1 ''
+     * CentOS 5.5 -> RHEL 6.1 ''
      * ''Sun JDK 1.6.0_14 -> Sun JDK 1.6.0_20 -> Sun JDK 1.6.0_26 ''
      * ''Apache Hadoop 0.20.2+patches -> Apache Hadoop 0.20.204+patches ''
      * ''Pig 0.9 heavily customized ''
@@ -407, +407 @@

    * ''Use a mix of Java, Pig and Hive. ''
  
   * ''[[http://www.memonews.com/en//|MeMo News - Online and Social Media 
Monitoring]] ''
-   * ''we use hadoop ''
+   * ''we use Hadoop ''
-    * ''as plattform for distributed crawling ''
+    * ''as platform for distributed crawling ''
     * ''to store and process unstructured data, such as news and social media 
(Hadoop, PIG, MapRed and HBase) ''
     * ''log file aggregation and processing (Flume) ''
  
   * ''[[http://www.mercadolibre.com//|Mercadolibre.com]] ''
    * ''20 nodes cluster (12 * 20 cores, 32GB, 53.3TB) ''
-   * ''Custemers log on on-line apps ''
+   * ''Customers log on on-line apps ''
    * ''Operations log processing ''
    * ''Use java, pig, hive, oozie ''
  
   * ''[[http://www.mobileanalytics.tv//|MobileAnalytic.TV]] ''
    * ''We use Hadoop to develop MapReduce algorithms: ''
-    * ''Information retrival and analytics ''
+    * ''Information retrieval and analytics ''
     * ''Machine generated content - documents, text, audio, & video ''
     * ''Natural Language Processing ''
    * ''Project portfolio includes:    * Natural Language Processing ''
@@ -464, +464 @@

  = O =
   * ''[[http://www.optivo.com|optivo]] - Email marketing software ''
    * ''We use Hadoop to aggregate and analyse email campaigns and user 
interactions. ''
-   * ''Developement is based on the github repository. ''
+   * ''Development is based on the github repository. ''
  
  = P =
   * ''[[http://papertrailapp.com/|Papertrail]] - Hosted syslog and app log 
management ''
@@ -500, +500 @@

    * ''We use Hadoop for analyzing poker players game history and generating 
gameplay related players statistics ''
  
   * ''[[http://www.portabilite.info|Portabilité]] ''
-   * ''50 node cluster in Colo. ''
+   * ''50 node cluster in a colocated site. ''
-   * ''Also used as a proof of concept cluster for a cloud based ERP syste. ''
+   * ''Also used as a proof of concept cluster for a cloud based ERP system. ''
  
   * ''[[http://www.psgtech.edu/|PSG Tech, Coimbatore, India]] ''
-   * ''Multiple alignment of protein sequences helps to determine evolutionary 
linkages and to predict molecular structures. The dynamic nature of the 
algorithm coupled with data and compute parallelism of hadoop data grids 
improves the accuracy and speed of sequence alignment. Parallelism at the 
sequence and block level reduces the time complexity of MSA problems. Scalable 
nature of Hadoop makes it apt to solve large scale alignment problems. ''
+   * ''Multiple alignment of protein sequences helps to determine evolutionary 
linkages and to predict molecular structures. The dynamic nature of the 
algorithm coupled with data and compute parallelism of Hadoop data grids 
improves the accuracy and speed of sequence alignment. Parallelism at the 
sequence and block level reduces the time complexity of MSA problems. The 
scalable nature of Hadoop makes it apt to solve large scale alignment problems. 
''
    * ''Our cluster size varies from 5 to 10 nodes. Cluster nodes vary from 
2950 Quad Core Rack Server, with 2x6MB Cache and 4 x 500 GB SATA Hard Drive to 
E7200 / E7400 processors with 4 GB RAM and 160 GB HDD. ''
  
  = Q =
@@ -524, +524 @@

  
   * ''[[http://www.rapleaf.com/|Rapleaf]] ''
    * ''80 node cluster (each node has: 2 quad core CPUs, 4TB storage, 16GB 
RAM) ''
-   * ''We use hadoop to process data relating to people on the web ''
+   * ''We use Hadoop to process data relating to people on the web ''
    * ''We also involved with Cascading to help simplify how our data flows 
through various processing stages ''
  
   * ''[[http://www.recruit.jp/corporate/english/|Recruit]] ''
@@ -544, +544 @@

  
   * ''[[http://www.rightnow.com/|RightNow Technologies]] - Powering Great 
Experiences ''
    * ''16 node cluster (each node has: 2 quad core CPUs, 6TB storage, 24GB 
RAM) ''
-   * ''We use hadoop for log and usage analysis ''
+   * ''We use Hadoop for log and usage analysis ''
    * ''We predominantly leverage Hive and HUE for data access ''
  
   * ''[[http://www.rubbelloselotto.de/|Rubbellose]] ''
@@ -555, +555 @@

    * ''SARA has initiated a Proof-of-Concept project to evaluate the Hadoop 
software stack for scientific use. ''
  
   * ''[[http://alpha.search.wikia.com|Search Wikia]] ''
-   * ''A project to help develop open source social search tools. We run a 125 
node hadoop cluster. ''
+   * ''A project to help develop open source social search tools. We run a 125 
node Hadoop cluster. ''
  
   * ''[[http://wwwse.inf.tu-dresden.de/SEDNS/SEDNS_home.html|SEDNS]] - 
Security Enhanced DNS Group ''
    * ''We are gathering world wide DNS data in order to discover content 
distribution networks and configuration issues utilizing Hadoop DFS and MapRed. 
''
@@ -628, +628 @@

    * ''We use Hadoop for log analysis. ''
  
   * ''[[http://www.tubemogul.com|TubeMogul]] ''
-   * ''We use Hadoop HDFS, Map/Reduce, Hive and Hbase ''
+   * ''We use Hadoop HDFS, Map/Reduce, Hive and HBase ''
  
    * ''We manage over 300 TB of HDFS data across four Amazon EC2 Availability 
Zone ''
  
@@ -640, +640 @@

    * ''We use both Scala and Java to access Hadoop's MapReduce APIs ''
    * ''We use Pig heavily for both scheduled and ad-hoc jobs, due to its 
ability to accomplish a lot with few statements. ''
    * ''We employ committers on Pig, Avro, Hive, and Cassandra, and contribute 
much of our internal Hadoop work to opensource (see 
[[http://github.com/kevinweil/hadoop-lzo|hadoop-lzo]]) ''
-   * ''For more on our use of hadoop, see the following presentations: 
[[http://www.slideshare.net/kevinweil/hadoop-pig-and-twitter-nosql-east-2009|Hadoop
 and Pig at Twitter]] and 
[[http://www.slideshare.net/kevinweil/protocol-buffers-and-hadoop-at-twitter|Protocol
 Buffers and Hadoop at Twitter]] ''
+   * ''For more on our use of Hadoop, see the following presentations: 
[[http://www.slideshare.net/kevinweil/hadoop-pig-and-twitter-nosql-east-2009|Hadoop
 and Pig at Twitter]] and 
[[http://www.slideshare.net/kevinweil/protocol-buffers-and-hadoop-at-twitter|Protocol
 Buffers and Hadoop at Twitter]] ''
  
   * ''[[http://tynt.com|Tynt]] ''
    * ''We use Hadoop to assemble web publishers' summaries of what users are 
copying from their websites, and to analyze user engagement on the web. ''
@@ -686, +686 @@

   * ''[[http://www.webmastersitesi.com|Webmaster Site]] ''
    * ''We use Hadoop for our webmaster tools. It allows us to store, index, 
search data in a much fast way. We also use it for logs analysis and trends 
prediction.''
    * ''4 node cluster (each node has: 4 core AMD CPUs, 2TB storage, 32GB RAM)''
-   * ''We use hadoop to process log data and perform on-demand analytics as 
well''
+   * ''We use Hadoop to process log data and perform on-demand analytics as 
well''
   * ''[[http://www.worldlingo.com/|WorldLingo]] ''
    * ''Hardware: 44 servers (each server has: 2 dual core CPUs, 2TB storage, 
8GB RAM) ''
    * ''Each server runs Xen with one Hadoop/HBase instance and another 
instance with web or application servers, giving us 88 usable virtual machines. 
''

[Hadoop Wiki] Update of "PoweredBy" by SteveLoughran

Reply via email to