[ 
https://issues.apache.org/jira/browse/NUTCH-1393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13596302#comment-13596302
 ] 

Lewis John McGibbney commented on NUTCH-1393:
---------------------------------------------

Hi Lufeng, this patch is not working.

{code}
$ ./bin/nutch generate
Usage: GeneratorJob [-topN N] [-crawlId cid] [-noFilter] [-noNorm]
Infos: All parameters using the default.
Infos: GeneratorJob -topN Long.MAX_VALUE
GeneratorJob: Selecting best-scoring urls due for fetch.
GeneratorJob: starting
GeneratorJob: filtering: true
GeneratorJob:
{code}

The purpose here is to print a usage message only if args.length==0. We also 
need to improve logging for whether normalization is actually on or off. As you 
see here, filtering is logged to stdout, but explicit logging for normalization 
is neglected.

I think I tried to hack this a while back but didn't quite get it.

With regards to the usage message, the ParserJob provides a good example

{code}
Usage: ParserJob (<batchId> | -all) [-crawlId <id>] [-resume] [-force]
    <batchId>     - symbolic batch ID created by Generator
    -crawlId <id> - the id to prefix the schemas to operate on, 
                    (default: storage.crawl.id)
    -all          - consider pages from all crawl jobs
    -resume       - resume a previous incomplete job
    -force        - force re-parsing even if a page is already parsed

{code} 
                
> Display consistent usage of GeneratorJob with 1.X
> -------------------------------------------------
>
>                 Key: NUTCH-1393
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1393
>             Project: Nutch
>          Issue Type: Bug
>          Components: administration gui, generator
>    Affects Versions: nutchgora
>            Reporter: Lewis John McGibbney
>             Fix For: 2.2
>
>         Attachments: NUTCH-1393.patch
>
>
> If we pass the generate argument to the nutch script, the Generator 
> auto-spings into action and begins generating fetchlists. This should not be 
> the case, instead it should print traditional usage to stdout. An example is 
> below
> {code}
> lewis@lewis:~/ASF/nutchgora/runtime/local$ ./bin/nutch generate
> GeneratorJob: Selecting best-scoring urls due for fetch.
> GeneratorJob: starting
> GeneratorJob: filtering: true
> GeneratorJob: done
> GeneratorJob: generated batch id: 1339628223-1694200031
> {code}
> All I wanted to do was get the usage params printed to stdout but instead it 
> generated my batch willy nilly.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to