[
https://issues.apache.org/jira/browse/NUTCH-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15213397#comment-15213397
]
Karanjeet Singh commented on NUTCH-2191:
----------------------------------------
[~lewismc]
I think this has already been done and the updated patch has an answer to most
of these comments. Having said that if I missed anything please let me know
[~markus17]:
* obvious system.outs need to be removed - *Removed*
* we need to consider whether we actually need
NicelyResynchronizingAjaxController - *Since this is the basic implementation,
I don't see a need to include the AjaxController*
* time outs in HttpResponse need to be configurable at least - *Done. Can be
configured through nutch-site.xml*
* CSS and javascript enabled should be configurable, css disabled by default,
Javascript enabled by default - *Done. Can be configured through nutch-site.xml*
* plugin needs to be listed in default.properties and build.xml - *Done. I
think default.properties is left which can be added*
* writing a screenshot directly to disk is odd when running on Hadoop/HDFS, it
would be better to use Hadoop's IO so we can write it on disk or HDFS
transparently - *I have followed the same procedure used for protocol-selenium.
Isn't that the intended one?*
* finally, we still need to address the redirect problem i described in
HttpResponse. - *Done.*
> Add protocol-htmlunit
> ---------------------
>
> Key: NUTCH-2191
> URL: https://issues.apache.org/jira/browse/NUTCH-2191
> Project: Nutch
> Issue Type: New Feature
> Components: protocol
> Affects Versions: 1.11
> Reporter: Markus Jelsma
> Assignee: Chris A. Mattmann
> Fix For: 1.12
>
> Attachments: NUTCH-2191.patch, NUTCH-2191.patch, NUTCH-2191.patch,
> NUTCH-2191.patch
>
>
> HtmlUnit is, opposed to other Javascript enabled headless browsers, a
> portable library and should therefore be better suited for very large scale
> crawls. This issue is an attempt to implement protocol-htmlunit.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)