[
https://issues.apache.org/jira/browse/NUTCH-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15168253#comment-15168253
]
Lewis John McGibbney commented on NUTCH-2235:
---------------------------------------------
The source of this issue is ordering of Nutch dependencies on the Java
classpath. Right now Hadoop 2.X uses httpclient and httpcore v4.2.5. whereas
the Selnium implementation in Nutch uses the following
{code}
classes/plugins/lib-selenium/httpclient-4.5.1.jar
classes/plugins/lib-selenium/httpcore-4.4.3.jar
{code}
This is not a compatible scenario.
> Classpath discrepancy with protocol-selenium in deploy mode
> -----------------------------------------------------------
>
> Key: NUTCH-2235
> URL: https://issues.apache.org/jira/browse/NUTCH-2235
> Project: Nutch
> Issue Type: Bug
> Components: build, plugin, protocol
> Affects Versions: 1.11
> Reporter: Lewis John McGibbney
> Assignee: Lewis John McGibbney
> Priority: Critical
> Fix For: 1.12
>
>
> when running with protocol-selenium in deploy mode I observe the following
> behaviour
> {code}
> lmcgibbn@LMC-032857 /usr/local/nutch(master) $ ./runtime/deploy/bin/nutch
> parsechecker -dumpText "http://www.jpl.nasa.gov"
> 16/02/25 15:22:08 INFO parse.ParserChecker: fetching: http://www.jpl.nasa.gov
> 16/02/25 15:22:08 INFO plugin.PluginRepository: Plugins: looking in:
> /usr/local/hadoop-2.5.2/hd-tmp/hadoop-unjar6419843999522854503/classes/plugins
> 16/02/25 15:22:09 INFO plugin.PluginRepository: Plugin Auto-activation mode:
> [true]
> 16/02/25 15:22:09 INFO plugin.PluginRepository: Registered Plugins:
> 16/02/25 15:22:09 INFO plugin.PluginRepository: the nutch core
> extension points (nutch-extensionpoints)
> 16/02/25 15:22:09 INFO plugin.PluginRepository: Basic URL Normalizer
> (urlnormalizer-basic)
> 16/02/25 15:22:09 INFO plugin.PluginRepository: Html Parse Plug-in
> (parse-html)
> 16/02/25 15:22:09 INFO plugin.PluginRepository: Basic Indexing Filter
> (index-basic)
> 16/02/25 15:22:09 INFO plugin.PluginRepository: Http Protocol Plug-in
> (protocol-selenium)
> 16/02/25 15:22:09 INFO plugin.PluginRepository: SolrIndexWriter
> (indexer-solr)
> 16/02/25 15:22:09 INFO plugin.PluginRepository: HTTP Framework
> (lib-http)
> 16/02/25 15:22:09 INFO plugin.PluginRepository: Regex URL Filter
> (urlfilter-regex)
> 16/02/25 15:22:09 INFO plugin.PluginRepository: Pass-through URL
> Normalizer (urlnormalizer-pass)
> 16/02/25 15:22:09 INFO plugin.PluginRepository: Regex URL Normalizer
> (urlnormalizer-regex)
> 16/02/25 15:22:09 INFO plugin.PluginRepository: CyberNeko HTML Parser
> (lib-nekohtml)
> 16/02/25 15:22:09 INFO plugin.PluginRepository: Tika Parser Plug-in
> (parse-tika)
> 16/02/25 15:22:09 INFO plugin.PluginRepository: OPIC Scoring Plug-in
> (scoring-opic)
> 16/02/25 15:22:09 INFO plugin.PluginRepository: Anchor Indexing Filter
> (index-anchor)
> 16/02/25 15:22:09 INFO plugin.PluginRepository: HTTP Framework
> (lib-selenium)
> 16/02/25 15:22:09 INFO plugin.PluginRepository: Regex URL Filter
> Framework (lib-regex-filter)
> 16/02/25 15:22:09 INFO plugin.PluginRepository: Registered Extension-Points:
> 16/02/25 15:22:09 INFO plugin.PluginRepository: Nutch URL Normalizer
> (org.apache.nutch.net.URLNormalizer)
> 16/02/25 15:22:09 INFO plugin.PluginRepository: Nutch Protocol
> (org.apache.nutch.protocol.Protocol)
> 16/02/25 15:22:09 INFO plugin.PluginRepository: Nutch Segment Merge
> Filter (org.apache.nutch.segment.SegmentMergeFilter)
> 16/02/25 15:22:09 INFO plugin.PluginRepository: Nutch URL Filter
> (org.apache.nutch.net.URLFilter)
> 16/02/25 15:22:09 INFO plugin.PluginRepository: Nutch Index Writer
> (org.apache.nutch.indexer.IndexWriter)
> 16/02/25 15:22:09 INFO plugin.PluginRepository: Nutch Indexing Filter
> (org.apache.nutch.indexer.IndexingFilter)
> 16/02/25 15:22:09 INFO plugin.PluginRepository: HTML Parse Filter
> (org.apache.nutch.parse.HtmlParseFilter)
> 16/02/25 15:22:09 INFO plugin.PluginRepository: Nutch Content Parser
> (org.apache.nutch.parse.Parser)
> 16/02/25 15:22:09 INFO plugin.PluginRepository: Nutch Scoring
> (org.apache.nutch.scoring.ScoringFilter)
> 16/02/25 15:22:09 INFO protocol.RobotRulesParser: robots.txt whitelist not
> configured.
> 16/02/25 15:22:09 INFO selenium.Http: http.proxy.host = null
> 16/02/25 15:22:09 INFO selenium.Http: http.proxy.port = 8080
> 16/02/25 15:22:09 INFO selenium.Http: http.proxy.exception.list = false
> 16/02/25 15:22:09 INFO selenium.Http: http.timeout = 10000
> 16/02/25 15:22:09 INFO selenium.Http: http.content.limit = -1
> 16/02/25 15:22:09 INFO selenium.Http: http.agent =
> nutch_test/Nutch-1.12-SNAPSHOT
> 16/02/25 15:22:09 INFO selenium.Http: http.accept.language =
> en-us,en-gb,en;q=0.7,*;q=0.3
> 16/02/25 15:22:09 INFO selenium.Http: http.accept =
> text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
> 16/02/25 15:22:09 ERROR selenium.Http: Failed to get protocol output
> java.lang.NoSuchFieldError: INSTANCE
> at
> org.apache.http.impl.io.DefaultHttpRequestWriterFactory.<init>(DefaultHttpRequestWriterFactory.java:52)
> at
> org.apache.http.impl.io.DefaultHttpRequestWriterFactory.<init>(DefaultHttpRequestWriterFactory.java:56)
> at
> org.apache.http.impl.io.DefaultHttpRequestWriterFactory.<clinit>(DefaultHttpRequestWriterFactory.java:46)
> at
> org.apache.http.impl.conn.ManagedHttpClientConnectionFactory.<init>(ManagedHttpClientConnectionFactory.java:72)
> at
> org.apache.http.impl.conn.ManagedHttpClientConnectionFactory.<init>(ManagedHttpClientConnectionFactory.java:84)
> at
> org.apache.http.impl.conn.ManagedHttpClientConnectionFactory.<clinit>(ManagedHttpClientConnectionFactory.java:59)
> at
> org.apache.http.impl.conn.PoolingHttpClientConnectionManager$InternalConnectionFactory.<init>(PoolingHttpClientConnectionManager.java:493)
> at
> org.apache.http.impl.conn.PoolingHttpClientConnectionManager.<init>(PoolingHttpClientConnectionManager.java:149)
> at
> org.apache.http.impl.conn.PoolingHttpClientConnectionManager.<init>(PoolingHttpClientConnectionManager.java:138)
> at
> org.apache.http.impl.conn.PoolingHttpClientConnectionManager.<init>(PoolingHttpClientConnectionManager.java:114)
> at
> org.openqa.selenium.remote.internal.HttpClientFactory.getClientConnectionManager(HttpClientFactory.java:74)
> at
> org.openqa.selenium.remote.internal.HttpClientFactory.<init>(HttpClientFactory.java:57)
> at
> org.openqa.selenium.remote.internal.HttpClientFactory.<init>(HttpClientFactory.java:60)
> at
> org.openqa.selenium.remote.internal.ApacheHttpClient$Factory.getDefaultHttpClientFactory(ApacheHttpClient.java:251)
> at
> org.openqa.selenium.remote.internal.ApacheHttpClient$Factory.<init>(ApacheHttpClient.java:228)
> at
> org.openqa.selenium.remote.HttpCommandExecutor.getDefaultClientFactory(HttpCommandExecutor.java:96)
> at
> org.openqa.selenium.remote.HttpCommandExecutor.<init>(HttpCommandExecutor.java:70)
> at
> org.openqa.selenium.remote.HttpCommandExecutor.<init>(HttpCommandExecutor.java:58)
> at
> org.openqa.selenium.firefox.internal.NewProfileExtensionConnection.start(NewProfileExtensionConnection.java:97)
> at
> org.openqa.selenium.firefox.FirefoxDriver.startClient(FirefoxDriver.java:271)
> at
> org.openqa.selenium.remote.RemoteWebDriver.<init>(RemoteWebDriver.java:117)
> at
> org.openqa.selenium.firefox.FirefoxDriver.<init>(FirefoxDriver.java:216)
> at
> org.openqa.selenium.firefox.FirefoxDriver.<init>(FirefoxDriver.java:211)
> at
> org.openqa.selenium.firefox.FirefoxDriver.<init>(FirefoxDriver.java:207)
> at
> org.openqa.selenium.firefox.FirefoxDriver.<init>(FirefoxDriver.java:120)
> at
> org.apache.nutch.protocol.selenium.HttpWebClient.getDriverForPage(HttpWebClient.java:75)
> at
> org.apache.nutch.protocol.selenium.HttpWebClient.getHtmlPage(HttpWebClient.java:155)
> at
> org.apache.nutch.protocol.selenium.HttpResponse.readPlainContent(HttpResponse.java:244)
> at
> org.apache.nutch.protocol.selenium.HttpResponse.<init>(HttpResponse.java:168)
> at org.apache.nutch.protocol.selenium.Http.getResponse(Http.java:56)
> at
> org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:261)
> at org.apache.nutch.parse.ParserChecker.run(ParserChecker.java:136)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.nutch.parse.ParserChecker.main(ParserChecker.java:265)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> Fetch failed with protocol status: exception(16), lastModified=0:
> java.lang.NoSuchFieldError: INSTANCE
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)