Assuming the intent is to transfer HTTP logs into hadoop. There are 2 ways of doing it that I can tell
1. Have a writer plugged into the HTTP server or simply tail the logs to the writer's inputstream which then writes it into HDFS files. 2. Enable FTP access on your server's log directory. This way you can pull the data from the server using hadoop shell commands via FTPFileSystem that makes the transfer between HDFS and FTP site seamless. Currently FTPFileSystem is a part of hadoop-trunk and slated for 0.18 release. -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Khanh Nguyen Sent: Monday, July 14, 2008 11:19 PM To: [email protected] Subject: Pulling input from http? Hello, I am struggling to get Hadoop to pull input from a http source but so far no luck. Is it even possible because in this case, the input is not placed in Hadoop's file system? An example code would be ideal. Thanks. -k
