nutch-user  

Re: CRAWLING USING HADOOP

brainstorm
Sun, 13 Jul 2008 11:50:55 -0700

Looks like you haven't done:

bin/hadoop namenode -format

*before anything else* (do start-all.sh, *after* formatting the
namenode)... this is just a guess. I *do* recommend you to start from
scratch reading this howto and follow it strictly step by step:

http://wiki.apache.org/nutch/NutchHadoopTutorial

It worked for me... good luck ! ;)

On Fri, Jul 11, 2008 at 7:57 AM, kranthi reddy <[EMAIL PROTECTED]> wrote:
> Hi ,
>
>  I am trying to crawl a few sites using nutch and hadoop . I have a cluster
> of 10 pc's and i have given nutch as a job file to hadoop. I am able to
> execute most commands like
>
>  bin_temp/hadoop dfs -put xxx yyy  (ls, mkdir) etc
>
> But when i try to run nutch then i get the following error.
>
> bin_temp/nutch crawl tempcrawl/urls -dir tempcrawl/crawl -depth 1
>
> Exception in thread "main" java.net.SocketTimeoutException: timed out
> waiting for rpc response
>        at org.apache.hadoop.ipc.Client.call(Client.java:473)
>        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:163)
>        at org.apache.hadoop.dfs.$Proxy0.getProtocolVersion(Unknown Source)
>        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:247)
>        at org.apache.hadoop.dfs.DFSClient.<init>(DFSClient.java:105)
>        at
> org.apache.hadoop.dfs.DistributedFileSystem$RawDistributedFileSystem.initialize(DistributedFileSystem.java:67)
>        at
> org.apache.hadoop.fs.FilterFileSystem.initialize(FilterFileSystem.java:57)
>        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:160)
>        at org.apache.hadoop.fs.FileSystem.getNamed(FileSystem.java:119)
>        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:91)
>        at org.apache.nutch.crawl.Crawl.main(Crawl.java:83)
>
> Some one please help me out.
>
> When i remove the hadoop-env.sh ,hadoop-site.xml and masters file and
> replace slaves with "localhost" ....i am able to crawl perfectly well (but
> only on master pc :(( )
>
> Thank you in advance.
> Kranthi reddy.B
>
  • Re: CRAWLING USING HADOOP brainstorm