[ 
https://issues.apache.org/jira/browse/NUTCH-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14307481#comment-14307481
 ] 

Mo Omer commented on NUTCH-1933:
--------------------------------

Right on - glad you all found it useful enough to integrate. As I mentioned on 
GH, I'd definitely recommend also including the selenium-grid plugin, since 
it's a wayyyy saner approach to integrating with Selenium.

When I cobbled this together, I was under pretty hard deadline pressures, and 
left a lot of cruft in. All references/files belonging to the old html-unit 
should be removed, .idea files/directories which I'd missed in my .gitignore 
should be tossed out; HttpResponse.java should be nearly empty when completed; 
HttpWebClient should allow the tag which Selenium collects innerHtml for to be 
configured (right now it's just 'body' with no config options).

This, and some Hadoop work a couple weeks after putting this together, was 
really the first time I'd used Java (outside of JRuby, which, doesn't 
realllllly count), so I apologize for the wack code smells I left in.

> nutch-selenium plugin
> ---------------------
>
>                 Key: NUTCH-1933
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1933
>             Project: Nutch
>          Issue Type: Bug
>          Components: protocol
>            Reporter: Mo Omer
>            Assignee: Lewis John McGibbney
>             Fix For: 1.10
>
>         Attachments: NUTCH-selenium-trunk.patch
>
>
> I updated the plugin [nutch-selenium|https://github.com/momer/nutch-selenium] 
> plugin to run against trunk.
> I feel that there is a good bit of work to be done here however early testing 
> on my system are that it works. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to