[ 
https://issues.apache.org/jira/browse/NUTCH-1761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Hosking updated NUTCH-1761:
---------------------------------

    Description: 
The crawl script that comes with all the version of Nutch I have checked set 
the local/distributed operating mode using a relative path (i.e. 
"../\*nutch-\*.job").

Bash seems to be taking this as relative to the location that the crawl script 
was called from, not the scripts actual location.

The result is that the script thinks it is in local mode because it cannot find 
the job file.  When trying to carry out a crawl jobs are submitted to Hadoop 
properly, but ifs that test for local (or not) mode fail and give strange 
results/result in crashes.

Using the first bash snippet from [here|https://stackoverflow.com/a/246128] I 
have modified the crawl script to look for a job file relative to the script 
location on disk.

I have attached a patch with my modifications.

  was:
The crawl script that comes with all the version of Nutch I have checked set 
the local/distributed operating mode using a relative path (i.e. 
"../*nutch-*.job").

Bash seems to be taking this as relative to the location that the crawl script 
was called from, not the scripts actual location.

The result is that the script thinks it is in local mode because it cannot find 
the job file.  When trying to carry out a crawl jobs are submitted to Hadoop 
properly, but ifs that test for local (or not) mode fail and give strange 
results/result in crashes.

Using the first bash snippet from [here|https://stackoverflow.com/a/246128] I 
have modified the crawl script to look for a job file relative to the script 
location on disk.

I have attached a patch with my modifications.


> Crawl script fails to find job file if not started from inside bin dir
> ----------------------------------------------------------------------
>
>                 Key: NUTCH-1761
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1761
>             Project: Nutch
>          Issue Type: Bug
>    Affects Versions: 1.8, 2.2.1
>         Environment: Ubuntu Server 13.10
>            Reporter: David Hosking
>            Priority: Minor
>              Labels: bash, distributed, script
>         Attachments: NUTCH-1761.patch
>
>
> The crawl script that comes with all the version of Nutch I have checked set 
> the local/distributed operating mode using a relative path (i.e. 
> "../\*nutch-\*.job").
> Bash seems to be taking this as relative to the location that the crawl 
> script was called from, not the scripts actual location.
> The result is that the script thinks it is in local mode because it cannot 
> find the job file.  When trying to carry out a crawl jobs are submitted to 
> Hadoop properly, but ifs that test for local (or not) mode fail and give 
> strange results/result in crashes.
> Using the first bash snippet from [here|https://stackoverflow.com/a/246128] I 
> have modified the crawl script to look for a job file relative to the script 
> location on disk.
> I have attached a patch with my modifications.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to