On Fri, 2005-11-04 at 20:41 -0800, Doug Cutting wrote:
> Rod Taylor wrote:
> > Here you go. local filesystem and a single job tracker on another
> > machine. When the tasktracker and jobtracker are on the same box there
> > isn't a problem. When they are on different machines it runs into
> > issues.
> > 
> > This is using mapred.local.dir on the local machine (not sharedd between
> > sbider4 and sbider5):
> 
> >         parsing /home/sitesell/localt/taskTracker/task_m_o59djj/job.xml
> >         [Fatal Error] :-1:-1: Premature end of file.
> 
> What is mapred.system.dir?  That must be shared.  Also, filenames you 
> pass to commands must be pathnames that work on all hosts.

I managed to get past all of the initial injection problems by running a
local crawl (no jobtracker) which created the crawldb/current/part-00000
files. So I was able to do a real inject, with jobtracker, for all of
the urls system wide without any complaints about files or directories
not existing.

Now, when trying to run a generate with a jobtracker it seems to have a
hard time finding the temporary working areas from one job to the next.
I cannot figure out where it is creating generate-temp-908680235. With
NDFS it would be /user/$USER/

<-- nutch generate -->
051107 091256 topN: 10000
051107 091256 Generator: starting
051107 091256 Generator:
segment: /opt/sitesell/sbider_data/test2/segments/20051107091256
051107 091256 Generator: Selecting most-linked urls due for fetch.
051107 091256 parsing file:/opt/nutch-0.8_7/conf/nutch-default.xml
051107 091256 parsing file:/opt/nutch-0.8_7/conf/mapred-default.xml
051107 091256 parsing file:/opt/nutch-0.8_7/conf/nutch-site.xml
051107 091256 parsing file:/opt/nutch-0.8_7/conf/nutch-default.xml
051107 091256 parsing file:/opt/nutch-0.8_7/conf/nutch-site.xml
051107 091256 Client connection to 192.168.100.14:5464: starting
051107 091256 Running job: job_xhvq9b
051107 091258  map 0%
051107 091300  map 5%
051107 091303  map 16%
051107 091305  map 21%
051107 091306  map 26%
051107 091308  map 32%
051107 091309  map 37%
051107 091312  map 47%
051107 091315  map 58%
051107 091318  map 68%
051107 091320  map 74%
051107 091321  map 79%
051107 091324  map 89%
051107 091327  map 100%
051107 091330  reduce 5%
051107 091332  reduce 11%
051107 091333  reduce 16%
051107 091335  reduce 21%
051107 091337  reduce 26%
051107 091339  reduce 37%
051107 091342  reduce 47%
051107 091344  reduce 53%
051107 091345  reduce 58%
051107 091347  reduce 63%
051107 091348  reduce 68%
051107 091351  reduce 79%
051107 091354  reduce 89%
051107 091357  reduce 100%
051107 091359 Job complete: job_xhvq9b
051107 091359 Generator: Partitioning selected urls by host, for
politeness.
051107 091359 parsing file:/opt/nutch-0.8_7/conf/nutch-default.xml
051107 091359 parsing file:/opt/nutch-0.8_7/conf/mapred-default.xml
051107 091359 parsing file:/opt/nutch-0.8_7/conf/nutch-site.xml
Exception in thread "main" java.io.IOException: No input directories
specified in: NutchConf: nutch-default.xml ,
mapred-default.xml , /home/sitesell/local/jobTracker/job_h22fvi.xml ,
nutch-site.xml
        at org.apache.nutch.ipc.Client.call(Client.java:294)
        at org.apache.nutch.ipc.RPC$Invoker.invoke(RPC.java:127)
        at $Proxy0.submitJob(Unknown Source)
        at
org.apache.nutch.mapred.JobClient.submitJob(JobClient.java:259)
        at org.apache.nutch.mapred.JobClient.runJob(JobClient.java:288)
        at org.apache.nutch.crawl.Generator.generate(Generator.java:213)
        at org.apache.nutch.crawl.Generator.main(Generator.java:258)

[EMAIL PROTECTED] sbider_data]$
cat /home/sitesell/local/jobTracker/job_h22fvi.xml | grep input
<property><name>mapred.input.format.class</name><value>org.apache.nutch.mapred.SequenceFileInputFormat</value></property>
<property><name>mapred.input.dir</name><value>generate-temp-908680235</value></property>
<property><name>mapred.input.value.class</name><value>org.apache.nutch.io.UTF8</value></property>
<property><name>mapred.input.key.class</name><value>org.apache.nutch.crawl.CrawlDatum</value></property>

-- 
Rod Taylor <[EMAIL PROTECTED]>

Reply via email to