Maybe just 1392? I went ahead and made a patch that should fix this. Feel free to commit or ignore prior to RC2.
On Thu, Jun 14, 2012 at 1:44 AM, Lewis John Mcgibbney < [email protected]> wrote: > Hi Sebastian, > > On Wed, Jun 13, 2012 at 11:30 PM, Sebastian Nagel > <[email protected]> wrote: > >I'll managed to perform a crawl with 2.0 and HBase: it rocks, indeed. > > Much simpler than 1.x (no segments!). > > :0) > > > % ./bin/nutch readdb -stats > > WebTable statistics start > > WebTableReader: java.io.EOFException > > at java.io.DataInputStream.readFully(DataInputStream.java:197) > > at java.io.DataInputStream.readFully(DataInputStream.java:169) > > at > org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1508) > > at > org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1486) > > at > org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1475) > > at > org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1470) > > at > > > org.apache.hadoop.mapred.SequenceFileOutputFormat.getReaders(SequenceFileOutputFormat.java:89) > > at > org.apache.nutch.crawl.WebTableReader.run(WebTableReader.java:537) > > at > org.apache.nutch.crawl.WebTableReader.processStatJob(WebTableReader.java:218) > > at > org.apache.nutch.crawl.WebTableReader.run(WebTableReader.java:479) > > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > > at > org.apache.nutch.crawl.WebTableReader.main(WebTableReader.java:412) > > --> readdb -dump works. > > Confirmed and ticket opened as NUTCH-1391 > > > % ./bin/nutch fetch 1339621550-203073321 -threads 1 -parse > > Exception in thread "main" java.lang.IllegalArgumentException: arg > -parse not recognized > > The parse argument was removed in Nutch 2.0 and now throws an > illegalargumentexception. This is now normal. To enable parsing during > fetching please set config in nutch-site.xml. The reason that the > incorrect -parse argument is till in the Usage message, is because I > was not diligent enough when patching the fetcher CLI aesthetics. I'll > address this within the issue below as well. > > > > > > > % ./bin/nutch parse -all -force -resume > > ParserJob: starting > > ParserJob: resuming: false <<< -resume and > > ParserJob: forced reparse: false <<< -force obviously ignored ? > > ParserJob: parsing all > > Yes confirmed and ticket opened as NUTCH-1392 > > > > % ./bin/nutch generate > > --> generates batchid, but should show help as in 1.x ? > > --> is there an option -topN ? > > Yes this is opened in NUTCH-1393. Users may not necessarily wish to > generate at all, instead wishing to merely find out the GeneratorJob > CLI options... I will open this just now and fix for 2.1. > > > The 2.0 Solr schema and mappings still contain the field "site" > > which has been removed in 1.x (NUTCH-1232). > > Should be done also in 2.0: it's easier to maintain only one Solr > installation > > for all Nutch versions. > > Logged in NUTCH-1394 > > Thanks Seb for your contributions here... this is exactly what we are > after. > > Does anyone have issues with running another RC and addressing these > issues in 2.1? > > -- > Lewis >

