[ https://issues.apache.org/jira/browse/NUTCH-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15213397#comment-15213397 ]
Karanjeet Singh commented on NUTCH-2191: ---------------------------------------- [~lewismc] I think this has already been done and the updated patch has an answer to most of these comments. Having said that if I missed anything please let me know [~markus17]: * obvious system.outs need to be removed - *Removed* * we need to consider whether we actually need NicelyResynchronizingAjaxController - *Since this is the basic implementation, I don't see a need to include the AjaxController* * time outs in HttpResponse need to be configurable at least - *Done. Can be configured through nutch-site.xml* * CSS and javascript enabled should be configurable, css disabled by default, Javascript enabled by default - *Done. Can be configured through nutch-site.xml* * plugin needs to be listed in default.properties and build.xml - *Done. I think default.properties is left which can be added* * writing a screenshot directly to disk is odd when running on Hadoop/HDFS, it would be better to use Hadoop's IO so we can write it on disk or HDFS transparently - *I have followed the same procedure used for protocol-selenium. Isn't that the intended one?* * finally, we still need to address the redirect problem i described in HttpResponse. - *Done.* > Add protocol-htmlunit > --------------------- > > Key: NUTCH-2191 > URL: https://issues.apache.org/jira/browse/NUTCH-2191 > Project: Nutch > Issue Type: New Feature > Components: protocol > Affects Versions: 1.11 > Reporter: Markus Jelsma > Assignee: Chris A. Mattmann > Fix For: 1.12 > > Attachments: NUTCH-2191.patch, NUTCH-2191.patch, NUTCH-2191.patch, > NUTCH-2191.patch > > > HtmlUnit is, opposed to other Javascript enabled headless browsers, a > portable library and should therefore be better suited for very large scale > crawls. This issue is an attempt to implement protocol-htmlunit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)