As you scan see from the below the %age complete is very low until all
of a sudden it jumps to fully complete. This started happening with some
segments about a week ago. Others go through their full list of ~10 000
urls. It appears to occur whether I use a generate.max.per.host
directive or if I leave it out. Plugins are as defined by default.

There are no errors logged at either the jobtracker or tasktracker.
Happens whether I use a datanode/namenode configuration or local
filesystem.

A full log for this task is attached.

051110 214542 task_m_8pwl0q  Parsing 
[http://www.nebrodibandb.it/chiesemonum.html] with [EMAIL PROTECTED]
051110 214543 task_m_8pwl0q  Parsing 
[http://www.nyc-architecture.com/SOH/SOH017.htm] with [EMAIL PROTECTED]
051110 214543 task_m_8pwl0q  Parsing 
[http://www.town.ocean-city.md.us/Recreation/Forms/CampRegistrationForm.html] 
with [EMAIL PROTECTED]
051110 214543 task_m_8pwl0q 0.0022044207% 470 pages, 71 errors, 9.4 pages/s, 
781 kb/s, 
051110 214544 task_m_8pwl0q 0.0022044207% 470 pages, 71 errors, 9.2 pages/s, 
766 kb/s, 
051110 214545 task_m_8pwl0q 0.0022044207% 470 pages, 71 errors, 9.0 pages/s, 
751 kb/s, 
051110 214546 task_m_8pwl0q 0.0022044207% 470 pages, 71 errors, 8.9 pages/s, 
737 kb/s, 
051110 214547 task_m_8pwl0q org.apache.nutch.protocol.RetryLater: Exceeded 
http.max.delays: retry later.
051110 214547 task_m_8pwl0q     at 
org.apache.nutch.protocol.httpclient.Http.blockAddr(Http.java:133)
051110 214547 task_m_8pwl0q     at 
org.apache.nutch.protocol.httpclient.Http.getProtocolOutput(Http.java:201)
051110 214547 task_m_8pwl0q     at 
org.apache.nutch.protocol.httpclient.Http.getProtocolOutput(Http.java:182)
051110 214547 task_m_8pwl0q     at 
org.apache.nutch.crawl.Fetcher$FetcherThread.run(Fetcher.java:114)
051110 214547 task_m_8pwl0q  fetch of 
http://www.thisisjersey.com/section/familynotices.html failed with: 
org.apache.nutch.protocol.RetryLater: Exceeded http.max.delays: retry later.
051110 214547 task_m_8pwl0q org.apache.nutch.protocol.RetryLater: Exceeded 
http.max.delays: retry later.
051110 214547 task_m_8pwl0q     at 
org.apache.nutch.protocol.httpclient.Http.blockAddr(Http.java:133)
051110 214547 task_m_8pwl0q     at 
org.apache.nutch.protocol.httpclient.Http.getProtocolOutput(Http.java:201)
051110 214547 task_m_8pwl0q     at 
org.apache.nutch.protocol.httpclient.Http.getProtocolOutput(Http.java:182)
051110 214547 task_m_8pwl0q     at 
org.apache.nutch.crawl.Fetcher$FetcherThread.run(Fetcher.java:114)
051110 214547 task_m_8pwl0q  fetch of 
http://www.thisisjersey.com/section/sale.html failed with: 
org.apache.nutch.protocol.RetryLater: Exceeded http.max.delays: retry later.
051110 214547 task_m_8pwl0q  Parsing 
[http://www.thisisjersey.com/itprofessionals/] with [EMAIL PROTECTED]
051110 214548 task_m_8pwl0q 0.0022044207% 471 pages, 73 errors, 8.7 pages/s, 
727 kb/s, 
051110 214549 task_m_8pwl0q 0.0022044207% 471 pages, 73 errors, 8.6 pages/s, 
713 kb/s, 
051110 214550 task_m_8pwl0q  Parsing [http://www.geocities.com/redzombies/] 
with [EMAIL PROTECTED]
051110 214550 task_m_8pwl0q 0.0022044207% 471 pages, 73 errors, 8.6 pages/s, 
713 kb/s, 
051110 214551 task_m_8pwl0q 0.0022044207% 472 pages, 73 errors, 8.3 pages/s, 
689 kb/s, 
051110 214551 task_m_8pwl0q  Parsing 
[http://www.communitytransport.com/events/2005/pdfs/brochure05.pdf] with [EMAIL 
PROTECTED]
051110 214552 task_m_8pwl0q 0.0022044207% 473 pages, 73 errors, 8.2 pages/s, 
680 kb/s, 
051110 214552 task_m_8pwl0q 0.0022044207% 473 pages, 73 errors, 8.2 pages/s, 
680 kb/s, 
051110 214552 Task task_m_8pwl0q is done.
-- 
Rod Taylor <[EMAIL PROTECTED]>

Attachment: task.log.gz
Description: GNU Zip compressed data

Reply via email to