[
https://issues.apache.org/jira/browse/NUTCH-2798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17156648#comment-17156648
]
Sebastian Nagel commented on NUTCH-2798:
----------------------------------------
Hi [~Mihir22], there are also errors related to the Nutch server and WebUI:
NotSerializableException, TimeoutException, UniformInterfaceException. As
[~balaShashanka] stated, the jobs continue running in the background but
they're reported as errors in the WebUI. You need to address these problems or
try without server/WebUI using the command-line script bin/crawl.
> Nutch v2.4 Not Able to crawl after javax.faces.viewstate
> --------------------------------------------------------
>
> Key: NUTCH-2798
> URL: https://issues.apache.org/jira/browse/NUTCH-2798
> Project: Nutch
> Issue Type: Bug
> Components: fetcher
> Affects Versions: 2.4
> Environment: Ubuntu mate
> Reporter: Mihir Sharma
> Priority: Trivial
> Attachments: hadoop.log.2020-07-10,
> image-2020-07-06-20-20-49-580.png, image-2020-07-06-20-22-07-351.png,
> image-2020-07-09-19-43-28-586.png, image-2020-07-09-21-07-32-811.png
>
>
> Nutch v2.4 Not crawling The html page After input tag with name
> javax.faces.viewstate it is crawling before this tag but unable to go ahead
> after this javax viewstate which is having a lot special character.
> This page is having different tabs, Current crawler is fetching information
> till date(
> Date Published: 06/30/2020 09:00 PM) After that it is unable to fetch from
> *Assembly Bill No. 103* which is title
> i m crawling this site:
> [http://leginfo.legislature.ca.gov/faces/billTextClient.xhtml?bill_id=201920200AB103]
>
> !image-2020-07-06-20-20-49-580.png!
>
> This is the output i am getting after crawling.
>
> !image-2020-07-06-20-22-07-351.png!
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)