[
https://issues.apache.org/jira/browse/NUTCH-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13502799#comment-13502799
]
Lewis John McGibbney commented on NUTCH-1370:
---------------------------------------------
Tested against medium sized seed lists and works a charm. I like the counters
Seb thanks for this contrib.
Committed @revision 1412566 in 2.x
This also covers NUTCH-1471
I also added the correct mapping for the host table in
gora-cassandra-mapping.xml
> Expose exact number of urls injected @runtime
> ----------------------------------------------
>
> Key: NUTCH-1370
> URL: https://issues.apache.org/jira/browse/NUTCH-1370
> Project: Nutch
> Issue Type: Improvement
> Components: injector
> Affects Versions: nutchgora, 1.5
> Reporter: Lewis John McGibbney
> Assignee: Lewis John McGibbney
> Priority: Minor
> Fix For: 1.6, 2.2
>
> Attachments: NUTCH-1370-1.x.patch, NUTCH-1370-2.x.patch,
> NUTCH-1370-2.x-v2.patch, NUTCH-1370-2.x-v3.patch
>
>
> Example: When using trunk, currently we see
> {code}
> 2012-05-22 09:04:00,239 INFO crawl.Injector - Injector: starting at
> 2012-05-22 09:04:00
> 2012-05-22 09:04:00,239 INFO crawl.Injector - Injector: crawlDb:
> crawl/crawldb
> 2012-05-22 09:04:00,239 INFO crawl.Injector - Injector: urlDir: urls
> 2012-05-22 09:04:00,253 INFO crawl.Injector - Injector: Converting injected
> urls to crawl db entries.
> 2012-05-22 09:04:00,955 INFO plugin.PluginRepository - Plugins: looking in:
> {code}
> I would like to see
> {code}
> 2012-05-22 09:04:00,239 INFO crawl.Injector - Injector: starting at
> 2012-05-22 09:04:00
> 2012-05-22 09:04:00,239 INFO crawl.Injector - Injector: crawlDb:
> crawl/crawldb
> 2012-05-22 09:04:00,239 INFO crawl.Injector - Injector: urlDir: urls
> 2012-05-22 09:04:00,253 INFO crawl.Injector - Injector: Injected N urls to
> crawl/crawldb
> 2012-05-22 09:04:00,253 INFO crawl.Injector - Injector: Converting injected
> urls to crawl db entries.
> 2012-05-22 09:04:00,955 INFO plugin.PluginRepository - Plugins: looking in:
> {code}
> This would make debugging easier and would help those who end up getting
> {code}
> 2012-05-22 09:04:04,850 WARN crawl.Generator - Generator: 0 records selected
> for fetching, exiting ...
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira