[Hadoop Wiki] Trivial Update of "Hbase/PoweredBy" by udanax

Apache Wiki Tue, 15 Feb 2011 20:22:44 -0800

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.


The "Hbase/PoweredBy" page has been changed by udanax.
The comment on this change is: add more info.
http://wiki.apache.org/hadoop/Hbase/PoweredBy?action=diff&rev1=57&rev2=58

--------------------------------------------------

  
  [[http://www.twitter.com|Twitter]] runs HBase across its entire Hadoop 
cluster.  HBase provides a distributed, read/write backup of all  mysql tables 
in Twitter's production backend, allowing engineers to run MapReduce jobs over 
the data while maintaining the ability to apply periodic row updates (something 
that is more difficult to do with vanilla HDFS).  A number of applications 
including people search rely on HBase internally for data generation. 
Additionally, the operations team uses HBase as a timeseries database for 
cluster-wide monitoring/performance data.
  
- [[http://www.udanax.org|Udanax.org]] (URL shortener) use HBase cluster to 
store URLs, Web Log data and response the real-time request on its Web Server. 
This application is now used for some twitter clients and a number of web sites 
and the rows are increasing as almost 30 per second.
+ [[http://www.udanax.org|Udanax.org]] (URL shortener) use 10 nodes HBase 
cluster to store URLs, Web Log data and response the real-time request on its 
Web Server. This application is now used for some twitter clients and a number 
of web sites. Currently API requests are almost 30 per second and web 
redirection requests are about 300 per second.
  
  [[http://www.veoh.com/|Veoh Networks]] uses HBase to store and process 
visitor(human) and entity(non-human) profiles which are used for behavioral 
targeting, demographic detection, and personalization services.  Our site reads 
this data in real-time (heavily cached) and submits updates via various batch 
map/reduce jobs. With 25 million unique visitors a month storing this data in a 
traditional RDBMS is not an option. We currently have a 24 node Hadoop/HBase 
cluster and our profiling system is sharing this cluster with our other Hadoop 
data pipeline processes.

[Hadoop Wiki] Trivial Update of "Hbase/PoweredBy" by udanax

Reply via email to