[
https://issues.apache.org/jira/browse/NUTCH-1393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13596302#comment-13596302
]
Lewis John McGibbney commented on NUTCH-1393:
---------------------------------------------
Hi Lufeng, this patch is not working.
{code}
$ ./bin/nutch generate
Usage: GeneratorJob [-topN N] [-crawlId cid] [-noFilter] [-noNorm]
Infos: All parameters using the default.
Infos: GeneratorJob -topN Long.MAX_VALUE
GeneratorJob: Selecting best-scoring urls due for fetch.
GeneratorJob: starting
GeneratorJob: filtering: true
GeneratorJob:
{code}
The purpose here is to print a usage message only if args.length==0. We also
need to improve logging for whether normalization is actually on or off. As you
see here, filtering is logged to stdout, but explicit logging for normalization
is neglected.
I think I tried to hack this a while back but didn't quite get it.
With regards to the usage message, the ParserJob provides a good example
{code}
Usage: ParserJob (<batchId> | -all) [-crawlId <id>] [-resume] [-force]
<batchId> - symbolic batch ID created by Generator
-crawlId <id> - the id to prefix the schemas to operate on,
(default: storage.crawl.id)
-all - consider pages from all crawl jobs
-resume - resume a previous incomplete job
-force - force re-parsing even if a page is already parsed
{code}
> Display consistent usage of GeneratorJob with 1.X
> -------------------------------------------------
>
> Key: NUTCH-1393
> URL: https://issues.apache.org/jira/browse/NUTCH-1393
> Project: Nutch
> Issue Type: Bug
> Components: administration gui, generator
> Affects Versions: nutchgora
> Reporter: Lewis John McGibbney
> Fix For: 2.2
>
> Attachments: NUTCH-1393.patch
>
>
> If we pass the generate argument to the nutch script, the Generator
> auto-spings into action and begins generating fetchlists. This should not be
> the case, instead it should print traditional usage to stdout. An example is
> below
> {code}
> lewis@lewis:~/ASF/nutchgora/runtime/local$ ./bin/nutch generate
> GeneratorJob: Selecting best-scoring urls due for fetch.
> GeneratorJob: starting
> GeneratorJob: filtering: true
> GeneratorJob: done
> GeneratorJob: generated batch id: 1339628223-1694200031
> {code}
> All I wanted to do was get the usage params printed to stdout but instead it
> generated my batch willy nilly.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira