On Wednesday 08 February 2012 18:27:32 Ken Krugler wrote: > On Feb 8, 2012, at 5:28am, Markus Jelsma wrote: > > On Wednesday 08 February 2012 14:22:36 Julien Nioche wrote: > >> sorry don't understand what your issue is. We have a dependency on > >> tika-parsers and the actual parser implementations (listed in tika > >> parsers' POM) are pulled transitively just like any other dependency > >> managed by Ivy. They end up being copied in > >> runtime/local/plugins/parse-tika/ or put in the job in runtime/deploy/ > > > > My problem is that i am working on some code for Tika-parsers > > 1.1-SNAPSHOT that i need to use in Nutch. However, when i build > > tika-parsers and put it in Nutch' lib directory i still seem to be > > missing dependencies. Then trouble > > > begins: > I don't know anything about how Nutch handles jars in its lib directory, > but this sounds like you have a "raw" jar (tika-parsers) without its > pom.xml. > > So then Ivy (or Maven) doesn't know about the transitive dependencies on > other jars, which are needed to implement the actual parsing support.
You're right, that's exactly what happened. However, i wasn't completely aware of it. Thanks > > -- Ken > > > Exception in thread "main" java.lang.NoClassDefFoundError: Could not > > initialize class org.apache.tika.parser.dwg.DWGParser > > > > at java.lang.Class.forName0(Native Method) > > at java.lang.Class.forName(Class.java:247) > > at sun.misc.Service$LazyIterator.next(Service.java:271) > > at > > org.apache.nutch.parse.tika.TikaConfig.<init>(TikaConfig.java:149 > > ) at > > > > org.apache.nutch.parse.tika.TikaConfig.getDefaultConfig(TikaConfig.java:2 > > 11) > > > > at > > org.apache.nutch.parse.tika.TikaParser.setConf(TikaParser.java:25 > > 4) at > > > > org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:162 > > ) > > > > at > > > > org.apache.nutch.parse.ParserFactory.getParsers(ParserFactory.java:132) > > > > at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:71) > > at > > org.apache.nutch.parse.ParserChecker.run(ParserChecker.java:101) > > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at > > org.apache.nutch.parse.ParserChecker.main(ParserChecker.java:138) > > > > Nick told me to remove DWG from the org.apache.tika.parsers.Parsers > > config file, which i did. But then other dependency issues come and go. > > The more parsers i remove from the config file the better it goes, but > > then Tika won't build anymore because of failing tests. > > > > I asked this on the Nutch list because i wasn't sure anymore how Nutch > > deals with these its own deps, which you explained well. > > > > I'll give up for now :) > > > >> On 8 February 2012 13:03, Markus Jelsma <[email protected]> wrote: > >>> Yes, it looks like it! It should also be upgraded to Tika 1.0. But > >>> that's something else. > >>> > >>> dependencies, dependencies, dependencies.... :( > >>> > >>> On Wednesday 08 February 2012 14:04:26 Julien Nioche wrote: > >>>> The dependencies for the plugins are defined locally as shown in the > >>>> URL below, where you can see the ref to tika-parsers for parse-tika. > >>>> Is that more clear for you Markus? > >>>> > >>>> On 8 February 2012 12:58, Lewis John Mcgibbney > >>> > >>> <[email protected]>wrote: > >>>>> Hi Markus, > >>>>> > >>>>> For starters > >>> > >>> http://svn.apache.org/viewvc/nutch/trunk/src/plugin/parse-tika/ivy.xml? > >>> vi > >>> > >>>>> ew=markup > >>>>> > >>>>> Can we pick our way through this? > >>>>> > >>>>> Thanks > >>>>> > >>>>> > >>>>> On Wed, Feb 8, 2012 at 12:50 PM, Markus Jelsma > >>>>> <[email protected] > >>>>> > >>>>>> wrote: > >>>>>> Hi, > >>>>>> > >>>>>> Can anyone shed light on this? We don't have any parsers in our libs > >>> > >>> dir > >>> > >>>>>> and > >>>>>> we don't have tika-parsers jar, only the tika-core jar. Where are > >>>>>> the parsers > >>>>>> and how does this all work? > >>>>>> > >>>>>> I've posted a question (same subject) on the Tika list and Nick > >>>>>> tells > >>> > >>> me > >>> > >>>>>> there > >>>>>> must be parsers somewhere. Well, i have no idea how we do it in > >>>>>> Nutch, do you? > >>>>>> > >>>>>> Thanks > >>>>> > >>>>> -- > >>>>> *Lewis* > >>> > >>> -- > >>> Markus Jelsma - CTO - Openindex > > -------------------------- > Ken Krugler > http://www.scaleunlimited.com > custom big data solutions & training > Hadoop, Cascading, Mahout & Solr -- Markus Jelsma - CTO - Openindex

