Mihir Sharma created NUTCH-2798:
-----------------------------------

             Summary: Nutch v2.4 Not Able to crawl after javax.faces.viewstate
                 Key: NUTCH-2798
                 URL: https://issues.apache.org/jira/browse/NUTCH-2798
             Project: Nutch
          Issue Type: Bug
          Components: fetcher
    Affects Versions: 2.4
         Environment: Ubuntu mate
            Reporter: Mihir Sharma
         Attachments: image-2020-07-06-20-20-49-580.png, 
image-2020-07-06-20-22-07-351.png

Nutch v2.4 Not crawling The html page After input tag with name 
javax.faces.viewstate it is crawling before this tag but unable to go ahead 
after this javax viewstate which is having a lot special character.

This page is having different tabs, Current crawler is fetching information 
till date(
Date Published: 06/30/2020 09:00 PM) After that it is unable to fetch from 
*Assembly Bill No. 103* which is title
i m crawling this site: 
[http://leginfo.legislature.ca.gov/faces/billTextClient.xhtml?bill_id=201920200AB103]

 

!image-2020-07-06-20-20-49-580.png!

 

This is the output i am getting after crawling.

 

!image-2020-07-06-20-22-07-351.png!

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to