Mihir Sharma created NUTCH-2798:
-----------------------------------
Summary: Nutch v2.4 Not Able to crawl after javax.faces.viewstate
Key: NUTCH-2798
URL: https://issues.apache.org/jira/browse/NUTCH-2798
Project: Nutch
Issue Type: Bug
Components: fetcher
Affects Versions: 2.4
Environment: Ubuntu mate
Reporter: Mihir Sharma
Attachments: image-2020-07-06-20-20-49-580.png,
image-2020-07-06-20-22-07-351.png
Nutch v2.4 Not crawling The html page After input tag with name
javax.faces.viewstate it is crawling before this tag but unable to go ahead
after this javax viewstate which is having a lot special character.
This page is having different tabs, Current crawler is fetching information
till date(
Date Published: 06/30/2020 09:00 PM) After that it is unable to fetch from
*Assembly Bill No. 103* which is title
i m crawling this site:
[http://leginfo.legislature.ca.gov/faces/billTextClient.xhtml?bill_id=201920200AB103]
!image-2020-07-06-20-20-49-580.png!
This is the output i am getting after crawling.
!image-2020-07-06-20-22-07-351.png!
--
This message was sent by Atlassian Jira
(v8.3.4#803005)