[ 
https://issues.apache.org/jira/browse/NUTCH-2798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17157344#comment-17157344
 ] 

Mihir Sharma commented on NUTCH-2798:
-------------------------------------

Hi [~balaShashanka] [~snagel]

Thanks For Your Reply I think Some Timeout exceptions Are Comming

These Are The Logs:

0 queues0 queues-activeThreads=0Jul 14, 2020 6:24:10 PM 
org.restlet.engine.log.LogFilter afterHandleINFO: 2020-07-14 18:24:10 127.0.0.1 
- - 8081 GET /job/default-FETCH-1045630350 - 200 - 0 0 http://localhost:8081 
Java/1.8.0_252 -Jul 14, 2020 6:24:11 PM org.restlet.engine.log.LogFilter 
afterHandleINFO: 2020-07-14 18:24:11 127.0.0.1 - - 8081 GET 
/job/default-FETCH-1045630350 - 200 - 0 0 http://localhost:8081 Java/1.8.0_252 
-fetch of 
http://leginfo.legislature.ca.gov/faces/billResultsClient.xhtml?location=CS48&agendadate=07%2F14%2F2020&description=Sen+Governmental+Organization
 failed {color:#FF0000}with: java.net.SocketTimeoutException: Read timed 
outfetching{color} 
http://leginfo.legislature.ca.gov/faces/billResultsClient.xhtml?location=CX39&agendadate=07%2F14%2F2020&description=Asm+Communications+and+Conveyance
 (queue crawl delay=10000ms)fetching 
http://leginfo.legislature.ca.gov/faces/javax.faces.resource/jsf.js?ln=javax.faces
 (queue crawl delay=10000ms)fetching 
http://leginfo.legislature.ca.gov/faces/dailyUpdates.xhtml?house=A&sortOrder=asc
 (queue crawl delay=10000ms)-finishing thread FetcherThread18, 
activeThreads=19-finishing thread FetcherThread8, activeThreads=18-finishing 
thread FetcherThread7, activeThreads=17-finishing thread FetcherThread3, 
activeThreads=16-finishing thread FetcherThread15, activeThreads=15-finishing 
thread FetcherThread16, activeThreads=14-finishing thread FetcherThread5, 
activeThreads=13-finishing thread FetcherThread19, activeThreads=12-finishing 
thread FetcherThread14, activeThreads=11-finishing thread FetcherThread9, 
activeThreads=10-finishing thread FetcherThread2, activeThreads=9-finishing 
thread FetcherThread10, activeThreads=8-finishing thread FetcherThread12, 
activeThreads=7-finishing thread FetcherThread17, activeThreads=6-finishing 
thread FetcherThread13, activeThreads=5-finishing thread FetcherThread6, 
activeThreads=4-finishing thread FetcherThread1, activeThreads=3-finishing 
thread FetcherThread11, activeThreads=2-finishing thread FetcherThread4, 
activeThreads=1-finishing thread FetcherThread0, activeThreads=0

 

i also edited Config 
http.timeout to 1000000
 

> Nutch v2.4 Not Able to crawl after javax.faces.viewstate
> --------------------------------------------------------
>
>                 Key: NUTCH-2798
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2798
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher
>    Affects Versions: 2.4
>         Environment: Ubuntu mate
>            Reporter: Mihir Sharma
>            Priority: Trivial
>         Attachments: hadoop.log.2020-07-10, 
> image-2020-07-06-20-20-49-580.png, image-2020-07-06-20-22-07-351.png, 
> image-2020-07-09-19-43-28-586.png, image-2020-07-09-21-07-32-811.png
>
>
> Nutch v2.4 Not crawling The html page After input tag with name 
> javax.faces.viewstate it is crawling before this tag but unable to go ahead 
> after this javax viewstate which is having a lot special character.
> This page is having different tabs, Current crawler is fetching information 
> till date(
> Date Published: 06/30/2020 09:00 PM) After that it is unable to fetch from 
> *Assembly Bill No. 103* which is title
> i m crawling this site: 
> [http://leginfo.legislature.ca.gov/faces/billTextClient.xhtml?bill_id=201920200AB103]
>  
> !image-2020-07-06-20-20-49-580.png!
>  
> This is the output i am getting after crawling.
>  
> !image-2020-07-06-20-22-07-351.png!
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to