> [ > https://issues.apache.org/jira/browse/NUTCH-1084?page=com.atlassian.jira.p > lugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13162170#comm > ent-13162170 ] > > Marek Bachmann commented on NUTCH-1084: > --------------------------------------- > > The same Exception is thrown when using the -get option in readseg. yes
> Is there some workaround yet? It is not efficient to copy the whole seg dir > to a local drive... :-/ no, unfortunately. I tried to crack the thing but ended up no nowhere. I still think it's something to do with the writable for metadata. > > > ReadDB url throws exception > > --------------------------- > > > > Key: NUTCH-1084 > > URL: https://issues.apache.org/jira/browse/NUTCH-1084 > > > > Project: Nutch > > > > Issue Type: Bug > > > > Affects Versions: 1.3 > > > > Reporter: Markus Jelsma > > Assignee: Markus Jelsma > > > > Fix For: 1.5 > > > > Readdb -url suffers from two problems: > > 1. it trips over the _SUCCESS file generated by newer Hadoop version > > 2. throws can't find class: org.apache.nutch.protocol.ProtocolStatus > > (???) The first problem can be remedied by not allowing the injector or > > updater to write the _SUCCESS file. Until now that's the solution > > implemented for similar issues. I've not been successful as to make the > > Hadoop readers simply skip the file. The second issue seems a bit > > strange and did not happen on a local check out. I'm not yet sure > > whether this is a Hadoop issue or something being corrupt in the > > CrawlDB. Here's the stack trace: {code} > > Exception in thread "main" java.io.IOException: can't find class: > > org.apache.nutch.protocol.ProtocolStatus because > > org.apache.nutch.protocol.ProtocolStatus > > > > at > > org.apache.hadoop.io.AbstractMapWritable.readFields(AbstractMapW > > ritable.java:204) at > > org.apache.hadoop.io.MapWritable.readFields(MapWritable.java:146 > > ) at > > org.apache.nutch.crawl.CrawlDatum.readFields(CrawlDatum.java:278 > > ) at > > org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(Sequenc > > eFile.java:1751) at > > org.apache.hadoop.io.MapFile$Reader.get(MapFile.java:524) at > > org.apache.hadoop.mapred.MapFileOutputFormat.getEntry(MapFileOut > > putFormat.java:105) at > > org.apache.nutch.crawl.CrawlDbReader.get(CrawlDbReader.java:383) > > at > > org.apache.nutch.crawl.CrawlDbReader.readUrl(CrawlDbReader.java: > > 389) at > > org.apache.nutch.crawl.CrawlDbReader.main(CrawlDbReader.java:514 > > ) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > at > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessor > > Impl.java:39) at > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethod > > AccessorImpl.java:25) at > > java.lang.reflect.Method.invoke(Method.java:597) > > at org.apache.hadoop.util.RunJar.main(RunJar.java:156) > > > > {code} > > -- > This message is automatically generated by JIRA. > If you think it was sent incorrectly, please contact your JIRA > administrators: > https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa > For more information on JIRA, see: http://www.atlassian.com/software/jira

