[
https://issues.apache.org/jira/browse/NUTCH-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17470996#comment-17470996
]
Hudson commented on NUTCH-2429:
-------------------------------
SUCCESS: Integrated in Jenkins build Nutch ยป Nutch-trunk #66 (See
[https://ci-builds.apache.org/job/Nutch/job/Nutch-trunk/66/])
NUTCH-2429 Fix Plugin System to allow protocol plugins to bundle their
URLStreamHandlers (#720) (github:
[https://github.com/apache/nutch/commit/e76d69fe13902fd2f3a98660dd2bac52c2ea568c])
* (edit) src/java/org/apache/nutch/plugin/PluginManifestParser.java
* (edit)
src/plugin/indexer-csv/src/java/org/apache/nutch/indexwriter/csv/CSVIndexWriter.java
* (edit) build.xml
* (add) src/plugin/protocol-foo/build.xml
* (edit) src/java/org/apache/nutch/util/NutchTool.java
* (edit) src/java/org/apache/nutch/parse/ParserChecker.java
* (add) src/plugin/protocol-foo/src/java/org/apache/nutch/protocol/foo/Foo.java
* (edit) src/plugin/build.xml
* (edit) src/java/org/apache/nutch/util/SitemapProcessor.java
* (edit) src/java/org/apache/nutch/crawl/CrawlDbReader.java
* (edit) src/java/org/apache/nutch/util/CrawlCompletionStats.java
* (add)
src/plugin/protocol-foo/src/java/org/apache/nutch/protocol/foo/Handler.java
* (edit) src/java/org/apache/nutch/plugin/PluginRepository.java
* (add) src/plugin/protocol-foo/plugin.xml
* (edit)
src/plugin/indexer-rabbit/src/java/org/apache/nutch/indexwriter/rabbit/RabbitIndexWriter.java
* (edit)
src/plugin/any23/src/java/org/apache/nutch/any23/Any23IndexingFilter.java
* (edit) src/plugin/any23/src/java/org/apache/nutch/any23/Any23ParseFilter.java
* (edit) src/java/org/apache/nutch/util/NutchJob.java
* (edit) src/java/org/apache/nutch/util/domain/DomainStatistics.java
* (add) src/plugin/protocol-foo/ivy.xml
* (add) src/java/org/apache/nutch/plugin/URLStreamHandlerFactory.java
> Fix Plugin System to allow protocol plugins to bundle their URLStreamHandlers
> -----------------------------------------------------------------------------
>
> Key: NUTCH-2429
> URL: https://issues.apache.org/jira/browse/NUTCH-2429
> Project: Nutch
> Issue Type: Improvement
> Components: commoncrawl
> Affects Versions: 1.14
> Environment: Tested on both Nutch 1.13 and 1.14 in Ubuntu Linux with
> OpenJDK 1.8.
> Reporter: Hiran Chaudhuri
> Assignee: Lewis John McGibbney
> Priority: Major
> Fix For: 1.19
>
>
> While trying to use the protocol-smb plugin (which is not part of the Nutch
> distribution) I realized there are four steps to successfully make use of a
> protocol plugin:
> 1 - put the artifact into the plugins directory
> 2 - modify Nutch configuration files to allow smb:// urls plus include the
> plugin to the loaded list
> 3 - extract jcifs.jar and place it on the system classpath
> 4 - run nutch with the correct system property
> While steps 1 and 2 seem obvious, 3 and 4 require knowledge of plugin
> internals which does not feel right for nutch and plugin users. Even more,
> the jcifs.jar would exist twice on the classpath and could even cause further
> problems during runtime.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)