[ 
https://issues.apache.org/jira/browse/NUTCH-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15213397#comment-15213397
 ] 

Karanjeet Singh commented on NUTCH-2191:
----------------------------------------

[~lewismc]

I think this has already been done and the updated patch has an answer to most 
of these comments. Having said that if I missed anything please let me know 
[~markus17]:


* obvious system.outs need to be removed - *Removed*
* we need to consider whether we actually need 
NicelyResynchronizingAjaxController - *Since this is the basic implementation, 
I don't see a need to include the AjaxController*
* time outs in HttpResponse need to be configurable at least - *Done. Can be 
configured through nutch-site.xml*
* CSS and javascript enabled should be configurable, css disabled by default, 
Javascript enabled by default - *Done. Can be configured through nutch-site.xml*
* plugin needs to be listed in default.properties and build.xml - *Done. I 
think default.properties is left which can be added*
* writing a screenshot directly to disk is odd when running on Hadoop/HDFS, it 
would be better to use Hadoop's IO so we can write it on disk or HDFS 
transparently - *I have followed the same procedure used for protocol-selenium. 
Isn't that the intended one?*
* finally, we still need to address the redirect problem i described in 
HttpResponse. - *Done.*

> Add protocol-htmlunit
> ---------------------
>
>                 Key: NUTCH-2191
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2191
>             Project: Nutch
>          Issue Type: New Feature
>          Components: protocol
>    Affects Versions: 1.11
>            Reporter: Markus Jelsma
>            Assignee: Chris A. Mattmann
>             Fix For: 1.12
>
>         Attachments: NUTCH-2191.patch, NUTCH-2191.patch, NUTCH-2191.patch, 
> NUTCH-2191.patch
>
>
> HtmlUnit is, opposed to other Javascript enabled headless browsers, a 
> portable library and should therefore be better suited for very large scale 
> crawls. This issue is an attempt to implement protocol-htmlunit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to