Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.
The following page has been changed by JustinLynn: http://wiki.apache.org/hadoop/Hbase/PoweredBy The comment on the change is: add SM ------------------------------------------------------------------------------ [http://www.openplaces.org Openplaces] is a search engine for travel that uses HBase to store terabytes of web pages and travel-related entity records (countries, cities, hotels, etc.). We have dozens of MapReduce jobs that crunch data on a daily basis. We use a 20-node cluster for development, a 40-node cluster for offline production processing and an EC2 cluster for the live web site. [http://www.powerset.com/ Powerset (a Microsoft company)] uses HBase to store raw documents. We have a ~110 node hadoop cluster running DFS, mapreduce, and hbase. In our wikipedia hbase table, we have one row for each wikipedia page (~2.5M pages and climbing). We use this as input to our indexing jobs, which are run in hadoop mapreduce. Uploading the entire wikipedia dump to our cluster takes a couple hours. Scanning the table inside mapreduce is very fast -- the latency is in the noise compared to everything else we do. + + [http://www.socialmedia.com/ SocialMedia] uses HBase to store and process user events which allows us to provide near-realtime user metrics and reporting. HBase forms the heart of our Advertising Network data storage and management system. We use HBase as a data source and sink for both realtime request cycle queries and as a backend for mapreduce analysis. [http://www.streamy.com/ Streamy] is a recently launched realtime social news site. We use HBase for all of our data storage, query, and analysis needs, replacing an existing SQL-based system. This includes hundreds of millions of documents, sparse matrices, logs, and everything else once done in the relational system. We perform significant in-memory caching of query results similar to a traditional Memcached/SQL setup as well as other external components to perform joining and sorting. We also run thousands of daily MapReduce jobs using HBase tables for log analysis, attention data processing, and feed crawling. HBase has helped us scale and distribute in ways we could not otherwise, and the community has provided consistent and invaluable assistance.
