Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.
The following page has been changed by MichaelBieniosek: http://wiki.apache.org/hadoop/Hbase/PoweredBy ------------------------------------------------------------------------------ [http://www.videosurf.com/ VideoSurf] - "The video search engine that has taught computers to see". We're using Hbase to persist various large graphs of data and other statistics. Hbase was a real win for us because it let us store substantially larger datasets without the need for manually partitioning the data and it's column-oriented nature allowed us to create schemas that were substantially more efficient for storing and retrieving data. + [http://www.powerset.com/ Powerset (a Microsoft company)] uses HBase to store raw documents. We have a ~70 node hadoop cluster running DFS, mapreduce, and hbase. In our wikipedia hbase table, we have one row for each wikipedia page (~2.5M pages and climbing). We use this as input to our indexing jobs, which are run in hadoop mapreduce. Uploading the entire wikipedia dump to our cluster takes a couple hours. Scanning the table inside mapreduce is very fast -- the latency is in the noise compared to everything else we do. +
