[jira] Created: (NUTCH-749) Fetching the url from crawldb

salima abdulsalam (JIRA) Fri, 21 Aug 2009 06:38:43 -0700

Fetching the url from crawldb
-----------------------------

                 Key: NUTCH-749
                 URL: https://issues.apache.org/jira/browse/NUTCH-749
             Project: Nutch
          Issue Type: Bug
         Environment: Nutch with solr integration
            Reporter: salima abdulsalam



Hi,
 Iam new to using the nutch with solr.I followed the link  
http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/  for 
integration.Iam getting an error while fetching the url from crawldb.

I used the below command

  bin/nutch fetch $SEGMENT -noParsing and i set the SEGMENT as  export 
SEGMENT=crawl/segments/`ls -tr crawl/segments|tail -1`

after running the command, iam getting the error as


Fetcher: Your 'http.agent.name' value should be listed first in 
'http.robots.agents' property.
Fetcher: starting
Fetcher: segment: crawl/segments/20090821062021
Exception in thread "main" java.io.IOException: Illegal file pattern: Expecting 
set closure character or end of range, or } for glob 20090821062021 at 30
        at 
org.apache.hadoop.fs.FileSystem$GlobFilter.error(FileSystem.java:1086)
        at 
org.apache.hadoop.fs.FileSystem$GlobFilter.setRegex(FileSystem.java:1071)
        at 
org.apache.hadoop.fs.FileSystem$GlobFilter.<init>(FileSystem.java:989)
        at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:955)
        at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:964)
        at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:964)
        at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:964)
        at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:964)
        at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:964)
        at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:964)
        at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:964)
        at 
org.apache.hadoop.fs.FileSystem.globStatusInternal(FileSystem.java:904)
        at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:868)
        at 
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:159)
        at 
org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:39)
        at 
org.apache.nutch.fetcher.Fetcher$InputFormat.getSplits(Fetcher.java:101)
        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:797)
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1142)
        at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:969)
        at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:1003)

Can anyone help in this.

Thanks,
Salima


 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (NUTCH-749) Fetching the url from crawldb

Reply via email to