Assuming the intent is to transfer HTTP logs into hadoop. There are 2
ways of doing it that I can tell

1. Have a writer plugged into the HTTP server or simply tail the logs to
the writer's inputstream which then writes it into HDFS files.
2. Enable FTP access on your server's log directory. This way you can
pull the data from the server using hadoop shell commands via
FTPFileSystem that makes the transfer between HDFS and FTP site
seamless. Currently FTPFileSystem is a part of hadoop-trunk and slated
for 0.18 release.

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On
Behalf Of Khanh Nguyen
Sent: Monday, July 14, 2008 11:19 PM
To: [email protected]
Subject: Pulling input from http?

Hello,

I am struggling to get Hadoop to pull input from a http source but so
far no luck. Is it even possible because in this case, the input is not
placed in Hadoop's file system? An example code would be ideal.

Thanks.

-k

Reply via email to