[
https://issues.apache.org/jira/browse/NUTCH-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15152140#comment-15152140
]
Markus Jelsma commented on NUTCH-2191:
--------------------------------------
Hi - it works indeed. But new problems appear, as usual!
1. SSL does not work due to
{code}
2016-02-18 11:53:21,130 ERROR htmlunit.Http - Failed to get protocol output
java.lang.IllegalArgumentException: Cannot locate declared field
org.apache.http.impl.client.HttpClientBuilder.sslContext
at
org.apache.commons.lang3.reflect.FieldUtils.readDeclaredField(FieldUtils.java:382)
at
com.gargoylesoftware.htmlunit.HttpWebConnection.createConnectionManager(HttpWebConnection.java:944)
at
com.gargoylesoftware.htmlunit.HttpWebConnection.getResponse(HttpWebConnection.java:161)
at
com.gargoylesoftware.htmlunit.WebClient.loadWebResponseFromWebConnection(WebClient.java:1321)
at
com.gargoylesoftware.htmlunit.WebClient.loadWebResponse(WebClient.java:1238)
at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:346)
at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:415)
at
org.apache.nutch.protocol.htmlunit.HttpResponse.<init>(HttpResponse.java:103)
{code}
2. I don't know how yet but since it uses Selenium, every time i try a file a
browser opens! This is crazy, i didn't know this was even possible.
Markus
> Add protocol-htmlunit
> ---------------------
>
> Key: NUTCH-2191
> URL: https://issues.apache.org/jira/browse/NUTCH-2191
> Project: Nutch
> Issue Type: New Feature
> Components: protocol
> Affects Versions: 1.11
> Reporter: Markus Jelsma
> Assignee: Markus Jelsma
> Fix For: 1.12
>
> Attachments: NUTCH-2191.patch, NUTCH-2191.patch
>
>
> HtmlUnit is, opposed to other Javascript enabled headless browsers, a
> portable library and should therefore be better suited for very large scale
> crawls. This issue is an attempt to implement protocol-htmlunit.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)