[Hadoop Wiki] Update of "Hbase/PoweredBy" by AbeTaha

Apache Wiki Tue, 20 Oct 2009 15:21:39 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.


The "Hbase/PoweredBy" page has been changed by AbeTaha.
http://wiki.apache.org/hadoop/Hbase/PoweredBy?action=diff&rev1=35&rev2=36

--------------------------------------------------

  
  [[http://www.flurry.com|Flurry]] provides mobile application analytics.  We 
use HBase and Hadoop for all of our analytics processing, and serve all of our 
live requests directly out of HBase on our 16-node production cluster with 
billions of rows over several tables.
  
- [[http://www.drawntoscaleconsulting.com|Drawn to Scale Consulting]] consults 
on HBase, Hadoop, Distributed Search, and Scalable architectures. 
+ [[http://www.drawntoscaleconsulting.com|Drawn to Scale Consulting]] consults 
on HBase, Hadoop, Distributed Search, and Scalable architectures.
  
  [[http://gumgum.com|GumGum]] is an analytics and monetization platform for 
online content. We've developed usage-based licensing models that make the best 
content in the world accessible to publishers of all sizes.  We use HBase 
0.20.0 on a 4-node Amazon EC2 cluster to record visits to advertisers in our ad 
network. Our production cluster has been running since July 2009.
  
- [[http://www.mahalo.com|Mahalo]], "...the world's first human-powered search 
engine". All the markup that powers the wiki is stored in HBase. It's been in 
use for a few months now. !MediaWiki - the same software that power Wikipedia - 
has version/revision control. Mahalo's in-house editors produce a lot of 
revisions per day, which was not working well in a RDBMS. An hbase-based 
solution for this was built and tested, and the data migrated out of MySQL and 
into HBase. Right now it's at something like 6 million items in HBase. The 
upload tool runs every hour from a shell script to back up that data, and on 6 
nodes takes about 5-10 minutes to run - and does not slow down production at 
all. 
+ [[http://www.mahalo.com|Mahalo]], "...the world's first human-powered search 
engine". All the markup that powers the wiki is stored in HBase. It's been in 
use for a few months now. !MediaWiki - the same software that power Wikipedia - 
has version/revision control. Mahalo's in-house editors produce a lot of 
revisions per day, which was not working well in a RDBMS. An hbase-based 
solution for this was built and tested, and the data migrated out of MySQL and 
into HBase. Right now it's at something like 6 million items in HBase. The 
upload tool runs every hour from a shell script to back up that data, and on 6 
nodes takes about 5-10 minutes to run - and does not slow down production at 
all.
  
  [[http://www.meetup.com|Meetup]] is on a mission to help the world’s people 
self-organize into local groups.  We use Hadoop and HBase to power a site-wide, 
real-time activity feed system for all of our members and groups.  Group 
activity is written directly to HBase, and indexed per member, with the 
member's custom feed served directly from HBase for incoming requests.  We're 
running HBase 0.20.0 on a 11 node cluster.
+ 
+ [[http://ning.com|Ning]] uses HBase to store and serve the results of 
processing user events and log files, which allows us to provide near-real time 
analytics and reporting. We use a small cluster of commodity machines with 4 
cores and 16GB of RAM per machine to handle all our analytics and reporting 
needs.
  
  [[http://www.openplaces.org|Openplaces]] is a search engine for travel that 
uses HBase to store terabytes of web pages and travel-related entity records 
(countries, cities, hotels, etc.). We have dozens of MapReduce jobs that crunch 
data on a daily basis.  We use a 20-node cluster for development, a 40-node 
cluster for offline production processing and an EC2 cluster for the live web 
site.
  
@@ -20, +22 @@

  
  [[http://www.streamy.com/|Streamy]] is a recently launched realtime social 
news site.  We use HBase for all of our data storage, query, and analysis 
needs, replacing an existing SQL-based system.  This includes hundreds of 
millions of documents, sparse matrices, logs, and everything else once done in 
the relational system.  We perform significant in-memory caching of query 
results similar to a traditional Memcached/SQL setup as well as other external 
components to perform joining and sorting.  We also run thousands of daily 
MapReduce jobs using HBase tables for log analysis, attention data processing, 
and feed crawling.  HBase has helped us scale and distribute in ways we could 
not otherwise, and the community has provided consistent and invaluable 
assistance.
  
- [[http://www.stumbleupon.com/|Stumbleupon]] and [[http://su.pr|Su.pr]] use 
HBase as a real time data storage and analytics platform. Serving directly out 
of HBase, various site features and statistics are kept up to date in a real 
time fashion. We also use HBase a map-reduce data source to overcome 
traditional query speed limits in MySQL. 
+ [[http://www.stumbleupon.com/|Stumbleupon]] and [[http://su.pr|Su.pr]] use 
HBase as a real time data storage and analytics platform. Serving directly out 
of HBase, various site features and statistics are kept up to date in a real 
time fashion. We also use HBase a map-reduce data source to overcome 
traditional query speed limits in MySQL.
  
  [[http://www.subrecord.org|SubRecord Project]] is an Open Source project that 
is using HBase as a repository of records (persisted map-like data) for the 
aspects it provides like logging, tracing or metrics. HBase and Lucene index 
both constitute a repo/storage for this platform.
  
@@ -32, +34 @@

  
  [[http://www.videosurf.com/|VideoSurf]] - "The video search engine that has 
taught computers to see". We're using Hbase to persist various large graphs of 
data and other statistics. Hbase was a real win for us because it let us store 
substantially larger datasets without the need for manually partitioning the 
data and it's column-oriented nature allowed us to create schemas that were 
substantially more efficient for storing and retrieving data.
  
- [[http://www.visibletechnologies.com/|Visible Technologies]] - We use Hadoop, 
HBase, Katta, and more to collect, parse, store, and search hundreds of 
millions of Social Media content. We get incredibly fast throughput and very 
low latency on commodity hardware. HBase enables our business to exist. 
+ [[http://www.visibletechnologies.com/|Visible Technologies]] - We use Hadoop, 
HBase, Katta, and more to collect, parse, store, and search hundreds of 
millions of Social Media content. We get incredibly fast throughput and very 
low latency on commodity hardware. HBase enables our business to exist.
  
  [[http://www.worldlingo.com/|WorldLingo]] - The !WorldLingo Multilingual 
Archive. We use HBase to store millions of documents that we scan using 
Map/Reduce jobs to machine translate them into all or selected target languages 
from our set of available machine translation languages. We currently store 12 
million documents but plan to eventually reach the 450 million mark. HBase 
allows us to scale out as we need to grow our storage capacities. Combined with 
Hadoop to keep the data replicated and therefore fail-safe we have the backbone 
our service can rely on now and in the future. !WorldLingo is using HBase since 
December 2007 and is along with a few others one of the longest running HBase 
installation. Currently we are running the latest HBase 0.20 and serving 
directly from it: 
[[http://www.worldlingo.com/ma/enwiki/en/HBase|MultilingualArchive]].

[Hadoop Wiki] Update of "Hbase/PoweredBy" by AbeTaha

Reply via email to