Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.

The "Hbase/PoweredBy" page has been changed by OtisGospodnetic:
http://wiki.apache.org/hadoop/Hbase/PoweredBy?action=diff&rev1=74&rev2=75

Comment:
Removed SubRecord project - it's dead

  
  [[http://www.stumbleupon.com/|Stumbleupon]] and [[http://su.pr|Su.pr]] use 
HBase as a real time data storage and analytics platform. Serving directly out 
of HBase, various site features and statistics are kept up to date in a real 
time fashion. We also use HBase a map-reduce data source to overcome 
traditional query speed limits in MySQL.
  
- [[http://www.subrecord.org|SubRecord Project]] is an Open Source project that 
is using HBase as a repository of records (persisted map-like data) for the 
aspects it provides like logging, tracing or metrics. HBase and Lucene index 
both constitute a repo/storage for this platform.
- 
  [[http://www.tokenizer.org|Shopping Engine at Tokenizer]] is a web crawler; 
it uses HBase to store URLs and Outlinks (!AnchorText + LinkedURL): more than a 
billion. It was initially designed as Nutch-Hadoop extension, then (due to very 
specific 'shopping' scenario) moved to SOLR + MySQL(InnoDB) (ten thousands 
queries per second), and now - to HBase. HBase is significantly faster due to: 
no need for huge transaction logs, column-oriented design exactly matches 
'lazy' business logic, data compression, !MapReduce support. Number of mutable 
'indexes' (term from RDBMS) significantly reduced due to the fact that each 
'row::column' structure is physically sorted by 'row'. MySQL InnoDB engine is 
best DB choice for highly-concurrent updates. However, necessity to flash a 
block of data to harddrive even if we changed only few bytes is obvious 
bottleneck. HBase greatly helps: not-so-popular in modern DBMS 'delete-insert', 
'mutable primary key', and 'natural primary key' patterns become a big 
advantage with HBase.
  
  [[http://traackr.com/|Traackr]] uses HBase to store and serve online 
influencer data in real-time. We use MapReduce to frequently re-score our 
entire data set as we keep updating influencer metrics on a daily basis.

Reply via email to