[ https://issues.apache.org/jira/browse/NUTCH-838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884997#action_12884997 ]
Hudson commented on NUTCH-838: ------------------------------ Integrated in Nutch-trunk #1197 (See [http://hudson.zones.apache.org/hudson/job/Nutch-trunk/1197/]) - fix for NUTCH-838 Add timing information to all Tool classes > Add timing information to all Tool classes > ------------------------------------------ > > Key: NUTCH-838 > URL: https://issues.apache.org/jira/browse/NUTCH-838 > Project: Nutch > Issue Type: New Feature > Components: fetcher, generator, indexer, linkdb, parser > Affects Versions: 1.1 > Environment: JDK 1.6, Linux & Windows > Reporter: Jeroen van Vianen > Assignee: Chris A. Mattmann > Fix For: 1.2, 2.0 > > Attachments: timings.patch > > > Am happily trying to crawl a few hundred URLs incrementally. Performance is > degrading suddenly after the index reaches approximately 25000 URLs. > At first each inject (generate, fetch, parse, updatedb) * 3, invertlinks, > solrindex, solrdedup batch takes approximately half an hour with topN 500, > but elapsed times now increase to 00h45m, 01h15m, 01h30m with every batch. > As I'm uncertain which of the phases takes so much time I decided to add > start and finish times to al classes that implement Tool so I at least have a > feeling and can review them in a log file. > Am using pretty old hardware, but I am planning to recrawl these URLs on a > regular basis and if every iteration is going to take more and more time, > index updates will be few and far between :-( > I added timing information to *all* Tool classes for consistency whereas > there are only 10 or so Tools that are really interesting. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.