Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.
The following page has been changed by RyanRawson: http://wiki.apache.org/hadoop/Hbase/PoweredBy ------------------------------------------------------------------------------ [http://www.streamy.com/ Streamy] is a recently launched realtime social news site. We use HBase for all of our data storage, query, and analysis needs, replacing an existing SQL-based system. This includes hundreds of millions of documents, sparse matrices, logs, and everything else once done in the relational system. We perform significant in-memory caching of query results similar to a traditional Memcached/SQL setup as well as other external components to perform joining and sorting. We also run thousands of daily MapReduce jobs using HBase tables for log analysis, attention data processing, and feed crawling. HBase has helped us scale and distribute in ways we could not otherwise, and the community has provided consistent and invaluable assistance. + [http://www.stumbleupon.com/ Stumbleupon] and [http://su.pr Su.pr] use HBase as a real time data storage and analytics platform. Serving directly out of HBase, various site features and statistics are kept up to date in a real time fashion. We also use HBase a map-reduce data source to overcome traditional query speed limits in MySQL. + [http://www.subrecord.org SubRecord Project] is an Open Source project that is using HBase as a repository of records (persisted map-like data) for the aspects it provides like logging, tracing or metrics. HBase and Lucene index both constitute a repo/storage for this platform. [http://www.tokenizer.org Shopping Engine at Tokenizer] is a web crawler; it uses HBase to store URLs and Outlinks (!AnchorText + LinkedURL): more than a billion. It was initially designed as Nutch-Hadoop extension, then (due to very specific 'shopping' scenario) moved to SOLR + MySQL(InnoDB) (ten thousands queries per second), and now - to HBase. HBase is significantly faster due to: no need for huge transaction logs, column-oriented design exactly matches 'lazy' business logic, data compression, !MapReduce support. Number of mutable 'indexes' (term from RDBMS) significantly reduced due to the fact that each 'row::column' structure is physically sorted by 'row'. MySQL InnoDB engine is best DB choice for highly-concurrent updates. However, necessity to flash a block of data to harddrive even if we changed only few bytes is obvious bottleneck. HBase greatly helps: not-so-popular in modern DBMS 'delete-insert', 'mutable primary key', and 'natural primary key' patterns become a big advantage with HBase.
