Hey Usman, your second approach is on the right track. You don't want to have your end users interacting directly with HDFS. The latency is too high, and it wasn't designed for this.
OTOH, running a "script" (a mapreduce, streaming, pig or hive job) on a regular basis and populating a database table is common practice and a great way to provide interactive access to summary/stats data. You can use the DBOutputFormat to make this even easier. You'll find DBOutputFormat and other database tools like Sqoop in Cloudera's Distro. Cheers, Christophe On Wed, Jul 8, 2009 at 3:26 PM, Usman Waheed<[email protected]> wrote: > Hi All, > > Is there a recommended way on how to extract data from HDFS and perform some > computations on the data in order to display the results on a webpage. One > thing that comes to my mind is to write simple CGI perl scripts that extract > the data from HDFS and perform computational work on the data before sending > the results to the browser. > > or > > Maybe run some scripts in the background that summarize the data in HDFS and > insert into a DB table. Can then write a web GUI that interacts with the DB > table and displays the desired stats with graphs using ploticus. Our data > set in HDFS will eventually grow so speed will be important. > > Thanks, > Usman > > > -- > Using Opera's revolutionary e-mail client: http://www.opera.com/mail/ > -- get hadoop: cloudera.com/hadoop online training: cloudera.com/hadoop-training blog: cloudera.com/blog twitter: twitter.com/cloudera
