[jira] [Updated] (NUTCH-1979) CrawlDbReader to implement Tool

Markus Jelsma (JIRA) Tue, 31 Mar 2015 02:30:13 -0700

     [ 
https://issues.apache.org/jira/browse/NUTCH-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Markus Jelsma updated NUTCH-1979:
---------------------------------
    Attachment: NUTCH-1979-trunk.patch

And this one includes the Benchmark class change. I had to modify stuff from 
configuration to jobconf. 

Before tool:
{code}
15/03/31 10:50:15 INFO crawl.CrawlDbReader: CrawlDb statistics start: 
-Dmapred.job.queue.name=crawler
15/03/31 10:50:18 INFO client.ConfiguredRMFailoverProxyProvider: Failing over 
to m2
15/03/31 10:50:18 WARN mapreduce.JobSubmitter: Hadoop command-line option 
parsing not performed. Implement the Tool interface and execute your 
application with ToolRunner to remedy this.
{code}

with Tool but with Configuration

{code}
15/03/31 11:03:22 INFO crawl.CrawlDbReader: CrawlDb statistics start: 
memex/crawl/crawldb
15/03/31 11:03:24 INFO client.ConfiguredRMFailoverProxyProvider: Failing over 
to m2
15/03/31 11:03:24 WARN mapreduce.JobSubmitter: Hadoop command-line option 
parsing not performed. Implement the Tool interface and execute your 
application with ToolRunner to remedy this.
{code}

Nutch now ignores the -D parameter, but Hadoop doesn't pick it up

{code}
15/03/31 11:10:12 INFO crawl.CrawlDbReader: CrawlDb statistics start: 
memex/crawl/crawldb
15/03/31 11:10:14 INFO client.ConfiguredRMFailoverProxyProvider: Failing over 
to m2
15/03/31 11:10:16 INFO mapred.FileInputFormat: Total input paths to process : 8
{code}

With JobConf, it works!

> CrawlDbReader to implement Tool
> -------------------------------
>
>                 Key: NUTCH-1979
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1979
>             Project: Nutch
>          Issue Type: Improvement
>          Components: crawldb
>    Affects Versions: 1.9
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>            Priority: Minor
>             Fix For: 1.10
>
>         Attachments: NUTCH-1979-trunk.patch, NUTCH-1979-trunk.patch, 
> NUTCH-1979-trunk.patch
>
>
> Evident, and a must-have when running on Hadoop 2.x with named queues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (NUTCH-1979) CrawlDbReader to implement Tool

Reply via email to