Hey Usman, your second approach is on the right track. You don't want
to have your end users interacting directly with HDFS. The latency is
too high, and it wasn't designed for this.

OTOH, running a "script" (a mapreduce, streaming, pig or hive job) on
a regular basis and populating a database table is common practice and
a great way to provide interactive access to summary/stats data. You
can use the DBOutputFormat to make this even easier. You'll find
DBOutputFormat and other database tools like Sqoop in Cloudera's
Distro.

Cheers,
Christophe

On Wed, Jul 8, 2009 at 3:26 PM, Usman Waheed<[email protected]> wrote:
> Hi All,
>
> Is there a recommended way on how to extract data from HDFS and perform some
> computations on the data in order to display the results on a webpage. One
> thing that comes to my mind is to write simple CGI perl scripts that extract
> the data from HDFS and perform computational work on the data before sending
> the results to the browser.
>
> or
>
> Maybe run some scripts in the background that summarize the data in HDFS and
> insert into a DB table. Can then write a web GUI that interacts with the DB
> table and displays the desired stats with graphs using ploticus. Our data
> set in HDFS will eventually grow so speed will be important.
>
> Thanks,
> Usman
>
>
> --
> Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
>



-- 
get hadoop: cloudera.com/hadoop
online training: cloudera.com/hadoop-training
blog: cloudera.com/blog
twitter: twitter.com/cloudera

Reply via email to