[ 
https://issues.apache.org/jira/browse/NUTCH-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17470996#comment-17470996
 ] 

Hudson commented on NUTCH-2429:
-------------------------------

SUCCESS: Integrated in Jenkins build Nutch ยป Nutch-trunk #66 (See 
[https://ci-builds.apache.org/job/Nutch/job/Nutch-trunk/66/])
NUTCH-2429 Fix Plugin System to allow protocol plugins to bundle their 
URLStreamHandlers (#720) (github: 
[https://github.com/apache/nutch/commit/e76d69fe13902fd2f3a98660dd2bac52c2ea568c])
* (edit) src/java/org/apache/nutch/plugin/PluginManifestParser.java
* (edit) 
src/plugin/indexer-csv/src/java/org/apache/nutch/indexwriter/csv/CSVIndexWriter.java
* (edit) build.xml
* (add) src/plugin/protocol-foo/build.xml
* (edit) src/java/org/apache/nutch/util/NutchTool.java
* (edit) src/java/org/apache/nutch/parse/ParserChecker.java
* (add) src/plugin/protocol-foo/src/java/org/apache/nutch/protocol/foo/Foo.java
* (edit) src/plugin/build.xml
* (edit) src/java/org/apache/nutch/util/SitemapProcessor.java
* (edit) src/java/org/apache/nutch/crawl/CrawlDbReader.java
* (edit) src/java/org/apache/nutch/util/CrawlCompletionStats.java
* (add) 
src/plugin/protocol-foo/src/java/org/apache/nutch/protocol/foo/Handler.java
* (edit) src/java/org/apache/nutch/plugin/PluginRepository.java
* (add) src/plugin/protocol-foo/plugin.xml
* (edit) 
src/plugin/indexer-rabbit/src/java/org/apache/nutch/indexwriter/rabbit/RabbitIndexWriter.java
* (edit) 
src/plugin/any23/src/java/org/apache/nutch/any23/Any23IndexingFilter.java
* (edit) src/plugin/any23/src/java/org/apache/nutch/any23/Any23ParseFilter.java
* (edit) src/java/org/apache/nutch/util/NutchJob.java
* (edit) src/java/org/apache/nutch/util/domain/DomainStatistics.java
* (add) src/plugin/protocol-foo/ivy.xml
* (add) src/java/org/apache/nutch/plugin/URLStreamHandlerFactory.java


> Fix Plugin System to allow protocol plugins to bundle their URLStreamHandlers
> -----------------------------------------------------------------------------
>
>                 Key: NUTCH-2429
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2429
>             Project: Nutch
>          Issue Type: Improvement
>          Components: commoncrawl
>    Affects Versions: 1.14
>         Environment: Tested on both Nutch 1.13 and 1.14 in Ubuntu Linux with 
> OpenJDK 1.8.
>            Reporter: Hiran Chaudhuri
>            Assignee: Lewis John McGibbney
>            Priority: Major
>             Fix For: 1.19
>
>
> While trying to use the protocol-smb plugin (which is not part of the Nutch 
> distribution) I realized there are four steps to successfully make use of a 
> protocol plugin:
> 1 - put the artifact into the plugins directory
> 2 - modify Nutch configuration files to allow smb:// urls plus include the 
> plugin to the loaded list
> 3 - extract jcifs.jar and place it on the system classpath
> 4 - run nutch with the correct system property
> While steps 1 and 2 seem obvious, 3 and 4 require knowledge of plugin 
> internals which does not feel right for nutch and plugin users. Even more, 
> the jcifs.jar would exist twice on the classpath and could even cause further 
> problems during runtime.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to