Re: Extracting data from HDFS and displaying stats to a webpage

Amr Awadallah Wed, 08 Jul 2009 20:48:25 -0700

Usman,

HDFS is a distributed grid file system, as opposed to a live servingdatabase. A simple analogy is the difference between the linux filesystem then mysql running on top of linux to enable it to be a database.Furthermore, HDFS is optimized for throughput as opposed to latency,hence it wont be very responsive for live serving.

That said, you should take a look at HBASE which is a key-valuedatabase written on top of HDFS, it is part of the official ApacheHadoop project. HBASE enables both lower request latency for liveserving, but also provides you with basic transactional semantics (on aper key basis) so that you can do updates/inserts/deletes. I thinkStumbleUpon is now serving their live traffic direct from HBASE, you cantake a look at their preso from the nosql event last month at:


http://blog.oskarsson.nu/2009/06/nosql-debrief.html

From that link you can also download presentations for a number ofother scalable low-latency key-value stores.

Finally, you should take a look at this nice blog post from linkedin,it shows a good example of how to use hadoop raw processing muscle toprepare that data then parallel load the results into a live serving system:


http://project-voldemort.com/blog/2009/06/building-a-1-tb-data-cycle-at-linkedin-with-hadoop-and-project-voldemort/

Cheers,

-- amr

Usman Waheed wrote:

Hi All,
Is there a recommended way on how to extract data from HDFS andperform some computations on the data in order to display the resultson a webpage. One thing that comes to my mind is to write simple CGIperl scripts that extract the data from HDFS and perform computationalwork on the data before sending the results to the browser.
or
Maybe run some scripts in the background that summarize the data inHDFS and insert into a DB table. Can then write a web GUI thatinteracts with the DB table and displays the desired stats with graphsusing ploticus. Our data set in HDFS will eventually grow so speedwill be important.
Thanks,
Usman

Re: Extracting data from HDFS and displaying stats to a webpage

Reply via email to