[ 
https://issues.apache.org/jira/browse/NUTCH-838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris A. Mattmann resolved NUTCH-838.
-------------------------------------

    Resolution: Fixed

- Patch applied to trunk in r960246 and backported to 1.2-branch in r960248. I 
had to make some minor CR-LF mods and avoid patching a few files that were 
removed in the latest trunk. Thanks, Jeroen!

> Add timing information to all Tool classes
> ------------------------------------------
>
>                 Key: NUTCH-838
>                 URL: https://issues.apache.org/jira/browse/NUTCH-838
>             Project: Nutch
>          Issue Type: New Feature
>          Components: fetcher, generator, indexer, linkdb, parser
>    Affects Versions: 1.1
>         Environment: JDK 1.6, Linux & Windows
>            Reporter: Jeroen van Vianen
>            Assignee: Chris A. Mattmann
>             Fix For: 1.2, 2.0
>
>         Attachments: timings.patch
>
>
> Am happily trying to crawl a few hundred URLs incrementally. Performance is 
> degrading suddenly after the index reaches approximately 25000 URLs.
> At first each inject (generate, fetch, parse, updatedb) * 3, invertlinks, 
> solrindex, solrdedup batch takes approximately half an hour with topN 500, 
> but elapsed times now increase to 00h45m,  01h15m, 01h30m with every batch. 
> As I'm uncertain which of the phases takes so much time I decided to add 
> start and finish times to al classes that implement Tool so I at least have a 
> feeling and can review them in a log file.
> Am using pretty old hardware, but I am planning to recrawl these URLs on a 
> regular basis and if every iteration is going to take more and more time, 
> index updates will be few and far between :-(
> I added timing information to *all* Tool classes for consistency whereas 
> there are only 10 or so Tools that are really interesting.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to