[jira] [Commented] (NUTCH-1895) run() method in Crawler.java doesnt put Nutch.ARG_BATCH in argMap

Sebastian Nagel (JIRA) Wed, 10 Dec 2014 02:41:26 -0800

    [ 
https://issues.apache.org/jira/browse/NUTCH-1895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14240889#comment-14240889
 ]


Sebastian Nagel commented on NUTCH-1895:
----------------------------------------

Hi [~FeiTian], usage of the class o.a.n.crawl.Crawler has been deprecated (a 
deprecation message is/was shown by {{bin/nutch crawl}}) and was later removed 
with NUTCH-1621 (will be included in 2.3). Thanks for your bug report, but we 
are sorry we cannot fix code which has been removed ;)  Please, use the script 
{{bin/crawl}} instead of the removed class.

> run() method in Crawler.java doesnt put Nutch.ARG_BATCH in argMap
> -----------------------------------------------------------------
>
>                 Key: NUTCH-1895
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1895
>             Project: Nutch
>          Issue Type: Bug
>          Components: crawldb, indexer
>    Affects Versions: 2.2.1
>         Environment: Win7, Solr4.10.1
>            Reporter: FeiTian
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> I am using Nutch 2.2.1 and Solr 4.10.1.
> OS: Win7.
> Env: MyEclipse 10.
> JAVA: jdk1.7.0_71
> I am using command:
>   urls -depth 3 -topN 10 -solr http://localhost:8080/solr/collection2
> to import data to Solr.
> and using:
>   gora.sqlstore.jdbc.driver=com.mysql.jdbc.Driver
>   
> gora.sqlstore.jdbc.url=jdbc:mysql://192.168.0.69:3306/nutch?createDatabaseIfNotExist=true&useUnicode=true&characterEncoding=utf8&autoReconnect=true&zeroDateTimeBehavior=convertToNull
>   gora.sqlstore.jdbc.user=root
>   gora.sqlstore.jdbc.password=123456
> to import data to mysql.
> But I got null pointer exception on batchId, then I found:
> In SolrIndexerJob.java, we need to get batchId from args:
>   @Override
>   public Map<String,Object> run(Map<String,Object> args) throws Exception {
>     String solrUrl = (String)args.get(Nutch.ARG_SOLR);
>     String batchId = (String)args.get(Nutch.ARG_BATCH);
>     NutchIndexWriterFactory.addClassToConf(getConf(), SolrWriter.class);
>     getConf().set(SolrConstants.SERVER_URL, solrUrl);
>     currentJob = createIndexJob(getConf(), "solr-index", batchId);
>     currentJob.waitForCompletion(true);
>     ToolUtil.recordJobStatus(null, currentJob, results);
>     return results;
>   }
> But in Crawler.java, we did not put batchid in argMap:
>  @Override
>   public int run(String[] args) throws Exception {
>     if (args.length == 0) {
>       System.out.println("Usage: Crawler (<seedDir> | -continue) [-solr 
> <solrURL>] [-threads n] [-depth i] [-topN N] [-numTasks N]");
>       return -1;
>     }
> ...
>     Map<String,Object> argMap = ToolUtil.toArgMap(
>         Nutch.ARG_THREADS, threads,
>         Nutch.ARG_DEPTH, depth,
>         Nutch.ARG_TOPN, topN,
>         Nutch.ARG_SOLR, solrUrl,
>         Nutch.ARG_SEEDDIR, seedDir,
>         Nutch.ARG_NUMTASKS, numTasks);
>     run(argMap);
>     return 0;
>   }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (NUTCH-1895) run() method in Crawler.java doesnt put Nutch.ARG_BATCH in argMap

Reply via email to