[ 
https://issues.apache.org/jira/browse/NUTCH-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14051968#comment-14051968
 ] 

Sebastian Nagel edited comment on NUTCH-1502 at 7/3/14 10:22 PM:
-----------------------------------------------------------------

Patch which adds the following test units:
* test matrix of state transitions with
** CrawlDbReducer and InjectReducer
** Default and AdaptiveFetchSchedule
* fetch_gone -> db_gone (NUTCH-1245)
* not modified time (cf. NUTCH-933)
* fetch_retry -> db_gone after max retries (NUTCH-578)
* immediate refetch by sync_delta of AdaptiveFetchSchedule (NUTCH-1564)
* signature reset / erroneous db_notmodified (NUTCH-1422)

The latter four points are open issues, the corresponding tests are in a 
separate TODO test class or marked as such. The tests should make it easier to 
find a solutions for these issues: they are now reproducible. That's the main 
improvement: the tests log lot of information which makes it possible to 
understand what's going wrong. Since these problems happen only after a long 
time it's hard to do the investigations in real crawls (need to check dozens of 
segments).


was (Author: wastl-nagel):
Patch which adds the following test units:
* test matrix of state transitions with
** CrawlDbReducer and InjectReducer
** Default and AdaptiveFetchSchedule
* fetch_gone -> db_gone (NUTCH-1245)
* not modified time (cf. NUTCH-933)
* fetch_retry -> db_gone after max retries (NUTCH-578)
* immediate refetch by sync_delta of AdaptiveFetchSchedule (NUTCH-1564)
* signature reset / erroneous db_notmodified (NUTCH-1422)

The latter for points are open issues, the corresponding tests are in a 
separate TODO test class or marked as such. The tests should make it easier to 
find a solutions for these issues: they are now reproducible. That's the main 
improvement: the tests log lot of information which makes it possible to 
understand what's going wrong. Since these problems happen only after a long 
time it's hard to do the investigations in real crawls (need to check dozens of 
segments).

> Test for CrawlDatum state transitions
> -------------------------------------
>
>                 Key: NUTCH-1502
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1502
>             Project: Nutch
>          Issue Type: Improvement
>          Components: crawldb
>    Affects Versions: 1.7, 2.2
>            Reporter: Sebastian Nagel
>             Fix For: 2.4, 1.9
>
>         Attachments: NUTCH-1502-trunk-v1.patch
>
>
> An exhaustive test to check the matrix of CrawlDatum state transitions 
> (CrawlStatus in 2.x) would be useful to detect errors esp. for continuous 
> crawls where the number of possible transitions is quite large. Additional 
> factors with impact on state transitions (retry counters, static and dynamic 
> intervals) are also tested.
> The tests will help to address the NUTCH-578 and NUTCH-1245. See the latter 
> for a first sketchy patch.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to