Re: readlinkdb fails to dump linkdb

Doğacan Güney Thu, 04 Dec 2008 10:27:12 -0800

On Thu, Dec 4, 2008 at 11:33 AM, brainstorm <[EMAIL PROTECTED]> wrote:
> On Wed, Dec 3, 2008 at 8:29 PM, Doğacan Güney <[EMAIL PROTECTED]> wrote:
>> On Wed, Dec 3, 2008 at 8:55 PM, brainstorm <[EMAIL PROTECTED]> wrote:
>>> Using nutch 0.9 (hadoop 0.17.1):
>>>
>>> [EMAIL PROTECTED] working]$ bin/nutch readlinkdb
>>> /home/hadoop/crawl-20081201/crawldb -dump crawled_urls.txt
>>> LinkDb dump: starting
>>> LinkDb db: /home/hadoop/crawl-urls-20081201/crawldb
>>                                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>
>> It seems you are providing a crawldb as argument. You should pass the linkdb.
>
>
> Thanks a lot for the hint, but I cannot find "linkdb" dir anywhere on
> the HDFS :_/ Can you point me where should it be ?


A linkdb is created with the command: invertlinks, e.g:

bin/nutch invertlinks crawl/linkdb crawl/segments/....

>
>
>>> java.io.IOException: Type mismatch in value from map: expected
>>> org.apache.nutch.crawl.Inlinks, recieved
>>> org.apache.nutch.crawl.CrawlDatum
>>>        at 
>>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:427)
>>>        at 
>>> org.apache.hadoop.mapred.lib.IdentityMapper.map(IdentityMapper.java:37)
>>>        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
>>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
>>>        at 
>>> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124)
>>>
>>> LinkDbReader: java.io.IOException: Job failed!
>>>        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1062)
>>>        at 
>>> org.apache.nutch.crawl.LinkDbReader.processDumpJob(LinkDbReader.java:110)
>>>        at org.apache.nutch.crawl.LinkDbReader.run(LinkDbReader.java:127)
>>>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>>        at org.apache.nutch.crawl.LinkDbReader.main(LinkDbReader.java:114)
>>>
>>> This is the first time I use readlinkdb and the rest of the crawling
>>> process is working ok, I've looked up JIRA and there's no related bug.
>>>
>>> I've also tried latest trunk nutch but DFS is not working for me:
>>>
>>> [EMAIL PROTECTED] trunk]$ bin/hadoop dfs -ls
>>>
>>> Exception in thread "main" java.lang.RuntimeException:
>>> java.lang.ClassNotFoundException:
>>> org.apache.hadoop.hdfs.DistributedFileSystem
>>>        at 
>>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:648)
>>>        at 
>>> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1334)
>>>        at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:56)
>>>        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1351)
>>>        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:213)
>>>        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:118)
>>>        at org.apache.hadoop.fs.FsShell.init(FsShell.java:88)
>>>        at org.apache.hadoop.fs.FsShell.run(FsShell.java:1698)
>>>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>>>        at org.apache.hadoop.fs.FsShell.main(FsShell.java:1847)
>>> Caused by: java.lang.ClassNotFoundException:
>>> org.apache.hadoop.hdfs.DistributedFileSystem
>>>        at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
>>>        at java.security.AccessController.doPrivileged(Native Method)
>>>        at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
>>>        at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
>>>        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>>>        at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
>>>        at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320)
>>>        at java.lang.Class.forName0(Native Method)
>>>        at java.lang.Class.forName(Class.java:247)
>>>        at 
>>> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:628)
>>>        at 
>>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:646)
>>>        ... 10 more
>>>
>>> Should I file both bugs on JIRA ?
>>>
>>
>> This I am not sure, but did you try ant clean; ant? It may be a
>> version mismatch.
>
>
> Yes, I did ant clean && ant before trying the above command. I also
> tried to upgrade the filesystem unsuccessfully and even created it
> from scratch:
>
> https://issues.apache.org/jira/browse/HADOOP-1212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12650556#action_12650556
>
>
>>
>> --
>> Doğacan Güney
>>
>



-- 
Doğacan Güney

Re: readlinkdb fails to dump linkdb

Reply via email to