Markus Jelsma created NUTCH-1600:
------------------------------------
Summary: Injector overwrite does not always work properly
Key: NUTCH-1600
URL: https://issues.apache.org/jira/browse/NUTCH-1600
Project: Nutch
Issue Type: Bug
Components: injector
Affects Versions: 1.7
Reporter: Markus Jelsma
Assignee: Markus Jelsma
Fix For: 1.8
db.injector.update works as it should but db.injector.overwrite doesn't always
seem to properly overwrite the record. This issue exists for some time and
we've already fixed it in our dist of Nutch.
This record just has been updated (interval).
{code}
Injector: starting at 2013-07-03 10:34:15
Injector: crawlDb: crawl/crawldb
Injector: urlDir: seeds
Injector: Converting injected urls to crawl db entries.
Injector: total number of urls rejected by filters: 0
Injector: total number of urls injected after normalization and filtering: 9
Injector: Merging injected urls into crawl db.
Injector: finished at 2013-07-03 10:34:21, elapsed: 00:00:05
URL: url
Version: 7
Status: 2 (db_fetched)
Fetch time: Fri Jul 05 12:11:44 CEST 2013
Modified time: Fri Jun 28 12:11:44 CEST 2013
Retries since fetch: 0
Retry interval: 604800 seconds (7 days)
Score: 0.0
Signature: ba29ef3e680323a6d0da74c156800e03
Metadata: Content-Type: text/html_pst_: success(1), lastModified=0
{code}
If we now overwrite the record, nothing happens. With this patch installed it
overwrites the record as it should and also logs update & overwrite switches to
console:
{code}
Injector: starting at 2013-07-03 10:36:30
Injector: crawlDb: crawl/crawldb
Injector: urlDir: seeds
Injector: Converting injected urls to crawl db entries.
Injector: total number of urls rejected by filters: 0
Injector: total number of urls injected after normalization and filtering: 9
Injector: Merging injected urls into crawl db.
Injector: overwrite: true
Injector: update: false
Injector: finished at 2013-07-03 10:36:36, elapsed: 00:00:05
URL: url
Version: 7
Status: 1 (db_unfetched)
Fetch time: Wed Jul 03 10:36:30 CEST 2013
Modified time: Thu Jan 01 01:00:00 CET 1970
Retries since fetch: 0
Retry interval: 14000 seconds (0 days)
Score: 1.0
Signature: null
Metadata: fixedInterval: 14000.0
{code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira