Thanks Christophe, Amr and Ted for your recommendations.
Cheers,
Usman
Hey Usman, your second approach is on the right track. You don't want
to have your end users interacting directly with HDFS. The latency is
too high, and it wasn't designed for this.
OTOH, running a "script" (a mapreduce, streaming, pig or hive job) on
a regular basis and populating a database table is common practice and
a great way to provide interactive access to summary/stats data. You
can use the DBOutputFormat to make this even easier. You'll find
DBOutputFormat and other database tools like Sqoop in Cloudera's
Distro.
Cheers,
Christophe
On Wed, Jul 8, 2009 at 3:26 PM, Usman Waheed<[email protected]> wrote:
Hi All,
Is there a recommended way on how to extract data from HDFS and perform
some
computations on the data in order to display the results on a webpage.
One
thing that comes to my mind is to write simple CGI perl scripts that
extract
the data from HDFS and perform computational work on the data before
sending
the results to the browser.
or
Maybe run some scripts in the background that summarize the data in
HDFS and
insert into a DB table. Can then write a web GUI that interacts with
the DB
table and displays the desired stats with graphs using ploticus. Our
data
set in HDFS will eventually grow so speed will be important.
Thanks,
Usman
--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/