Re: Extracting data from HDFS and displaying stats to a webpage

Usman Waheed Wed, 08 Jul 2009 23:26:50 -0700

Thanks Christophe, Amr and Ted for your recommendations.
Cheers,
Usman

Hey Usman, your second approach is on the right track. You don't want
to have your end users interacting directly with HDFS. The latency is
too high, and it wasn't designed for this.

OTOH, running a "script" (a mapreduce, streaming, pig or hive job) on
a regular basis and populating a database table is common practice and
a great way to provide interactive access to summary/stats data. You
can use the DBOutputFormat to make this even easier. You'll find
DBOutputFormat and other database tools like Sqoop in Cloudera's
Distro.

Cheers,
Christophe

On Wed, Jul 8, 2009 at 3:26 PM, Usman Waheed<[email protected]> wrote:
Hi All,
Is there a recommended way on how to extract data from HDFS and performsomecomputations on the data in order to display the results on a webpage.Onething that comes to my mind is to write simple CGI perl scripts thatextractthe data from HDFS and perform computational work on the data beforesending
the results to the browser.

or
Maybe run some scripts in the background that summarize the data inHDFS andinsert into a DB table. Can then write a web GUI that interacts withthe DBtable and displays the desired stats with graphs using ploticus. Ourdata
set in HDFS will eventually grow so speed will be important.

Thanks,
Usman


--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/




--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/

Re: Extracting data from HDFS and displaying stats to a webpage

Reply via email to