data does not exist

veeresh beeram Sun, 22 Feb 2015 12:06:06 -0800

Hi,

I was unable to reproduce the linkdb error.


The NSIDC ADE 403 forbidden error occurs because NSIDC seems to be blocking
User-Agent's containing "nutch" in them.

--
Thanks,
Veeresh

On 20 February 2015 at 15:26, Shuo Li <[email protected]> wrote:

> Hi,
>
> I'm trying to crawl  NSF ACADIS with nutch-selenium. I meet a problem *with 
> linkdb/current/part-00000/data
> does not exist. *I checked my directory and my files during crawling, and
> it appears this file sometimes exist and sometimes disappear. This is quite
> weird and stranger.
>
> Another problem is when we crawl NSIDC ADE, it will give us a 403
> forbidden error. Does this mean NSIDC ADE is blocking us?
>
> The log of first error is in the bottom of this email. Any help would be
> appreciated.
>
> Regards,
> Shuo Li
>
>
>
>
>
> LinkDb: merging with existing linkdb: nsfacadis3Crawl/linkdb
> LinkDb: java.io.FileNotFoundException: File
> file:/vagrant/nutch/runtime/local/nsfacadis3Crawl/linkdb/current/part-00000/data
> does not exist.
> at
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:402)
> at
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:255)
> at
> org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:47)
> at
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:208)
> at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:1081)
> at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1073)
> at org.apache.hadoop.mapred.JobClient.access$700(JobClient.java:179)
> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:983)
> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
> at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936)
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:910)
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1353)
> at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:208)
> at org.apache.nutch.crawl.LinkDb.run(LinkDb.java:316)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at org.apache.nutch.crawl.LinkDb.main(LinkDb.java:276)
>

Re: linkdb/current/part-00000/data does not exist

Reply via email to