[
https://issues.apache.org/jira/browse/NUTCH-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1979:
---------------------------------
Attachment: NUTCH-1979-trunk.patch
And this one includes the Benchmark class change. I had to modify stuff from
configuration to jobconf.
Before tool:
{code}
15/03/31 10:50:15 INFO crawl.CrawlDbReader: CrawlDb statistics start:
-Dmapred.job.queue.name=crawler
15/03/31 10:50:18 INFO client.ConfiguredRMFailoverProxyProvider: Failing over
to m2
15/03/31 10:50:18 WARN mapreduce.JobSubmitter: Hadoop command-line option
parsing not performed. Implement the Tool interface and execute your
application with ToolRunner to remedy this.
{code}
with Tool but with Configuration
{code}
15/03/31 11:03:22 INFO crawl.CrawlDbReader: CrawlDb statistics start:
memex/crawl/crawldb
15/03/31 11:03:24 INFO client.ConfiguredRMFailoverProxyProvider: Failing over
to m2
15/03/31 11:03:24 WARN mapreduce.JobSubmitter: Hadoop command-line option
parsing not performed. Implement the Tool interface and execute your
application with ToolRunner to remedy this.
{code}
Nutch now ignores the -D parameter, but Hadoop doesn't pick it up
{code}
15/03/31 11:10:12 INFO crawl.CrawlDbReader: CrawlDb statistics start:
memex/crawl/crawldb
15/03/31 11:10:14 INFO client.ConfiguredRMFailoverProxyProvider: Failing over
to m2
15/03/31 11:10:16 INFO mapred.FileInputFormat: Total input paths to process : 8
{code}
With JobConf, it works!
> CrawlDbReader to implement Tool
> -------------------------------
>
> Key: NUTCH-1979
> URL: https://issues.apache.org/jira/browse/NUTCH-1979
> Project: Nutch
> Issue Type: Improvement
> Components: crawldb
> Affects Versions: 1.9
> Reporter: Markus Jelsma
> Assignee: Markus Jelsma
> Priority: Minor
> Fix For: 1.10
>
> Attachments: NUTCH-1979-trunk.patch, NUTCH-1979-trunk.patch,
> NUTCH-1979-trunk.patch
>
>
> Evident, and a must-have when running on Hadoop 2.x with named queues.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)