If you are looking to crawl websites, you can take a look at Apache Nutch and 
how it connects with Apache Hadoop.

I'll let others comment on why we do not recommend this, but I can surely think 
of a case where a large-slotted cluster having all its tasks hitting a 
particular site at the same time can be one reason why this has to be done with 
care.

On 10-Jan-2012, at 7:18 PM, Jayunit100 wrote:

> At the cloudera course, they said this is a bad idea, but im working at a 
> place that does just this... In the reducers..... the answer is Yes.... You 
> can make http requests in Hadoop jobs.
> 
> I'd like to know more about others thoughts on this.... Is it customary ?
> 
> Jay Vyas 
> MMSB
> UCHC
> 
> On Jan 10, 2012, at 4:23 AM, <[email protected]> wrote:
> 
>> Hi ,
>> 
>> 
>> 
>> Is it possible to get data from web services using Hadoop MR jobs?
>> 
>> 
>> 
>> Regards,
>> 
>> Shreya
>> 
>> 
>> This e-mail and any files transmitted with it are for the sole use of the 
>> intended recipient(s) and may contain confidential and privileged 
>> information.
>> If you are not the intended recipient, please contact the sender by reply 
>> e-mail and destroy all copies of the original message.
>> Any unauthorized review, use, disclosure, dissemination, forwarding, 
>> printing or copying of this email or any action taken in reliance on this 
>> e-mail is strictly prohibited and may be unlawful.

Reply via email to