[ 
https://issues.apache.org/jira/browse/NUTCH-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16128738#comment-16128738
 ] 

Sebastian Nagel commented on NUTCH-2378:
----------------------------------------

Applied patch/PR to 1.x. Pull request opened for 2.x as well.

There was a gory little problem in 2.x:
{noformat}
Testcase: testJsoupHtmlParser took 1.702 sec
        Caused an ERROR
org.apache.nutch.parse.jsoup.extractor.ViewCountNormalizer cannot be cast to 
org.apache.nutch.core.jsoup.extractor.normalizer.Normalizable
java.lang.ClassCastException: 
org.apache.nutch.parse.jsoup.extractor.ViewCountNormalizer cannot be cast to 
org.apache.nutch.core.jsoup.extractor.normalizer.Normalizable
        at 
org.apache.nutch.core.jsoup.extractor.JsoupDocumentReader.parseNormalizers(JsoupDocumentReader.java:103)
        at 
org.apache.nutch.core.jsoup.extractor.JsoupDocumentReader.parse(JsoupDocumentReader.java:78)
        at 
org.apache.nutch.core.jsoup.extractor.JsoupDocumentReader.<init>(JsoupDocumentReader.java:173)
        at 
org.apache.nutch.core.jsoup.extractor.JsoupDocumentReader.getInstance(JsoupDocumentReader.java:60)
        at 
org.apache.nutch.parse.jsoup.extractor.JsoupHtmlParser.setConf(JsoupHtmlParser.java:116)
        at 
org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:163)
        at org.apache.nutch.parse.ParseFilters.<init>(ParseFilters.java:65)
        at org.apache.nutch.parse.html.HtmlParser.setConf(HtmlParser.java:345)
        at 
org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:163)
        at 
org.apache.nutch.parse.ParserFactory.getParsers(ParserFactory.java:131)
        at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:143)
        at 
org.apache.nutch.parse.jsoup.extractor.TestJsoupHtmlParser.testJsoupHtmlParser(TestJsoupHtmlParser.java:70)
{noformat}

A consequence of the child-first classloader is that classes and interfaces 
living in the plugin's class loader cannot used from another classloader 
"outside". This also applies to unit test which access the plugin the "normal" 
way (here via "ParseUtil.parse()"). The fix was trivial: move the class 
ViewCountNormalizer from the test classpath into the plugin jar (classpath of 
the plugin). 

> ChildFirst plugin classloader
> -----------------------------
>
>                 Key: NUTCH-2378
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2378
>             Project: Nutch
>          Issue Type: Improvement
>          Components: plugin
>    Affects Versions: 1.13
>            Reporter: Jurian Broertjes
>             Fix For: 2.4, 1.14
>
>         Attachments: NUTCH-2378-childfirst-plugin-classloader.patch
>
>
> While working on upgrading the indexer-elastic plugin from 2.x to 5.x, I ran 
> into several nasty runtime dependency issues (both local and on Hadoop). 
> After seeking help on the mailing list, I still was unable to resolve these 
> issues and after digging further, decided to try a different plugin 
> classloader strategy. 
> The normal classloader delegates class loading requests to it's parent 
> classloader. This can cause all sorts of nasty runtime dependency version 
> conflicts (jar hell, version conflicts), since the plugin's own classloader 
> gets queried last. The child-first classloader approach tries to load a class 
> from the plugin's dependencies first and when unavailable, delegates to it's 
> parent classloader. This fixed the issues I had.
> The new approach can give runtime LinkageErrors, but these are easily 
> resolvable (see the patch for a few examples)
> I've tested the new loader a bit and am curious about others' findings.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to