If you are looking to crawl websites, you can take a look at Apache Nutch and how it connects with Apache Hadoop.
I'll let others comment on why we do not recommend this, but I can surely think of a case where a large-slotted cluster having all its tasks hitting a particular site at the same time can be one reason why this has to be done with care. On 10-Jan-2012, at 7:18 PM, Jayunit100 wrote: > At the cloudera course, they said this is a bad idea, but im working at a > place that does just this... In the reducers..... the answer is Yes.... You > can make http requests in Hadoop jobs. > > I'd like to know more about others thoughts on this.... Is it customary ? > > Jay Vyas > MMSB > UCHC > > On Jan 10, 2012, at 4:23 AM, <[email protected]> wrote: > >> Hi , >> >> >> >> Is it possible to get data from web services using Hadoop MR jobs? >> >> >> >> Regards, >> >> Shreya >> >> >> This e-mail and any files transmitted with it are for the sole use of the >> intended recipient(s) and may contain confidential and privileged >> information. >> If you are not the intended recipient, please contact the sender by reply >> e-mail and destroy all copies of the original message. >> Any unauthorized review, use, disclosure, dissemination, forwarding, >> printing or copying of this email or any action taken in reliance on this >> e-mail is strictly prohibited and may be unlawful.
