[
https://issues.apache.org/jira/browse/NUTCH-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14051968#comment-14051968
]
Sebastian Nagel edited comment on NUTCH-1502 at 7/3/14 10:22 PM:
-----------------------------------------------------------------
Patch which adds the following test units:
* test matrix of state transitions with
** CrawlDbReducer and InjectReducer
** Default and AdaptiveFetchSchedule
* fetch_gone -> db_gone (NUTCH-1245)
* not modified time (cf. NUTCH-933)
* fetch_retry -> db_gone after max retries (NUTCH-578)
* immediate refetch by sync_delta of AdaptiveFetchSchedule (NUTCH-1564)
* signature reset / erroneous db_notmodified (NUTCH-1422)
The latter four points are open issues, the corresponding tests are in a
separate TODO test class or marked as such. The tests should make it easier to
find a solutions for these issues: they are now reproducible. That's the main
improvement: the tests log lot of information which makes it possible to
understand what's going wrong. Since these problems happen only after a long
time it's hard to do the investigations in real crawls (need to check dozens of
segments).
was (Author: wastl-nagel):
Patch which adds the following test units:
* test matrix of state transitions with
** CrawlDbReducer and InjectReducer
** Default and AdaptiveFetchSchedule
* fetch_gone -> db_gone (NUTCH-1245)
* not modified time (cf. NUTCH-933)
* fetch_retry -> db_gone after max retries (NUTCH-578)
* immediate refetch by sync_delta of AdaptiveFetchSchedule (NUTCH-1564)
* signature reset / erroneous db_notmodified (NUTCH-1422)
The latter for points are open issues, the corresponding tests are in a
separate TODO test class or marked as such. The tests should make it easier to
find a solutions for these issues: they are now reproducible. That's the main
improvement: the tests log lot of information which makes it possible to
understand what's going wrong. Since these problems happen only after a long
time it's hard to do the investigations in real crawls (need to check dozens of
segments).
> Test for CrawlDatum state transitions
> -------------------------------------
>
> Key: NUTCH-1502
> URL: https://issues.apache.org/jira/browse/NUTCH-1502
> Project: Nutch
> Issue Type: Improvement
> Components: crawldb
> Affects Versions: 1.7, 2.2
> Reporter: Sebastian Nagel
> Fix For: 2.4, 1.9
>
> Attachments: NUTCH-1502-trunk-v1.patch
>
>
> An exhaustive test to check the matrix of CrawlDatum state transitions
> (CrawlStatus in 2.x) would be useful to detect errors esp. for continuous
> crawls where the number of possible transitions is quite large. Additional
> factors with impact on state transitions (retry counters, static and dynamic
> intervals) are also tested.
> The tests will help to address the NUTCH-578 and NUTCH-1245. See the latter
> for a first sketchy patch.
--
This message was sent by Atlassian JIRA
(v6.2#6252)