[Hadoop Wiki] Update of "Hbase/PoweredBy" by RyanRawson

Apache Wiki Thu, 10 Sep 2009 16:27:19 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.


The following page has been changed by RyanRawson:
http://wiki.apache.org/hadoop/Hbase/PoweredBy

------------------------------------------------------------------------------
  
  [http://www.streamy.com/ Streamy] is a recently launched realtime social news 
site.  We use HBase for all of our data storage, query, and analysis needs, 
replacing an existing SQL-based system.  This includes hundreds of millions of 
documents, sparse matrices, logs, and everything else once done in the 
relational system.  We perform significant in-memory caching of query results 
similar to a traditional Memcached/SQL setup as well as other external 
components to perform joining and sorting.  We also run thousands of daily 
MapReduce jobs using HBase tables for log analysis, attention data processing, 
and feed crawling.  HBase has helped us scale and distribute in ways we could 
not otherwise, and the community has provided consistent and invaluable 
assistance.
  
+ [http://www.stumbleupon.com/ Stumbleupon] and [http://su.pr Su.pr] use HBase 
as a real time data storage and analytics platform. Serving directly out of 
HBase, various site features and statistics are kept up to date in a real time 
fashion. We also use HBase a map-reduce data source to overcome traditional 
query speed limits in MySQL. 
+ 
  [http://www.subrecord.org SubRecord Project] is an Open Source project that 
is using HBase as a repository of records (persisted map-like data) for the 
aspects it provides like logging, tracing or metrics. HBase and Lucene index 
both constitute a repo/storage for this platform.
  
  [http://www.tokenizer.org Shopping Engine at Tokenizer] is a web crawler; it 
uses HBase to store URLs and Outlinks (!AnchorText + LinkedURL): more than a 
billion. It was initially designed as Nutch-Hadoop extension, then (due to very 
specific 'shopping' scenario) moved to SOLR + MySQL(InnoDB) (ten thousands 
queries per second), and now - to HBase. HBase is significantly faster due to: 
no need for huge transaction logs, column-oriented design exactly matches 
'lazy' business logic, data compression, !MapReduce support. Number of mutable 
'indexes' (term from RDBMS) significantly reduced due to the fact that each 
'row::column' structure is physically sorted by 'row'. MySQL InnoDB engine is 
best DB choice for highly-concurrent updates. However, necessity to flash a 
block of data to harddrive even if we changed only few bytes is obvious 
bottleneck. HBase greatly helps: not-so-popular in modern DBMS 'delete-insert', 
'mutable primary key', and 'natural primary key' patterns become 
 a big advantage with HBase.

[Hadoop Wiki] Update of "Hbase/PoweredBy" by RyanRawson

Reply via email to