Thanks Christophe, Amr and Ted for your recommendations.
Cheers,
Usman

Hey Usman, your second approach is on the right track. You don't want
to have your end users interacting directly with HDFS. The latency is
too high, and it wasn't designed for this.

OTOH, running a "script" (a mapreduce, streaming, pig or hive job) on
a regular basis and populating a database table is common practice and
a great way to provide interactive access to summary/stats data. You
can use the DBOutputFormat to make this even easier. You'll find
DBOutputFormat and other database tools like Sqoop in Cloudera's
Distro.

Cheers,
Christophe

On Wed, Jul 8, 2009 at 3:26 PM, Usman Waheed<[email protected]> wrote:
Hi All,

Is there a recommended way on how to extract data from HDFS and perform some computations on the data in order to display the results on a webpage. One thing that comes to my mind is to write simple CGI perl scripts that extract the data from HDFS and perform computational work on the data before sending
the results to the browser.

or

Maybe run some scripts in the background that summarize the data in HDFS and insert into a DB table. Can then write a web GUI that interacts with the DB table and displays the desired stats with graphs using ploticus. Our data
set in HDFS will eventually grow so speed will be important.

Thanks,
Usman


--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/







--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/

Reply via email to