[
https://issues.apache.org/jira/browse/NUTCH-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14307481#comment-14307481
]
Mo Omer commented on NUTCH-1933:
--------------------------------
Right on - glad you all found it useful enough to integrate. As I mentioned on
GH, I'd definitely recommend also including the selenium-grid plugin, since
it's a wayyyy saner approach to integrating with Selenium.
When I cobbled this together, I was under pretty hard deadline pressures, and
left a lot of cruft in. All references/files belonging to the old html-unit
should be removed, .idea files/directories which I'd missed in my .gitignore
should be tossed out; HttpResponse.java should be nearly empty when completed;
HttpWebClient should allow the tag which Selenium collects innerHtml for to be
configured (right now it's just 'body' with no config options).
This, and some Hadoop work a couple weeks after putting this together, was
really the first time I'd used Java (outside of JRuby, which, doesn't
realllllly count), so I apologize for the wack code smells I left in.
> nutch-selenium plugin
> ---------------------
>
> Key: NUTCH-1933
> URL: https://issues.apache.org/jira/browse/NUTCH-1933
> Project: Nutch
> Issue Type: Bug
> Components: protocol
> Reporter: Mo Omer
> Assignee: Lewis John McGibbney
> Fix For: 1.10
>
> Attachments: NUTCH-selenium-trunk.patch
>
>
> I updated the plugin [nutch-selenium|https://github.com/momer/nutch-selenium]
> plugin to run against trunk.
> I feel that there is a good bit of work to be done here however early testing
> on my system are that it works.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)