On Feb 8, 2012, at 5:28am, Markus Jelsma wrote: > > > On Wednesday 08 February 2012 14:22:36 Julien Nioche wrote: >> sorry don't understand what your issue is. We have a dependency on >> tika-parsers and the actual parser implementations (listed in tika parsers' >> POM) are pulled transitively just like any other dependency managed by Ivy. >> They end up being copied in runtime/local/plugins/parse-tika/ or put in >> the job in runtime/deploy/ > > My problem is that i am working on some code for Tika-parsers 1.1-SNAPSHOT > that i need to use in Nutch. However, when i build tika-parsers and put it in > Nutch' lib directory i still seem to be missing dependencies. Then trouble > begins:
I don't know anything about how Nutch handles jars in its lib directory, but this sounds like you have a "raw" jar (tika-parsers) without its pom.xml. So then Ivy (or Maven) doesn't know about the transitive dependencies on other jars, which are needed to implement the actual parsing support. -- Ken > > Exception in thread "main" java.lang.NoClassDefFoundError: Could not > initialize class org.apache.tika.parser.dwg.DWGParser > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:247) > at sun.misc.Service$LazyIterator.next(Service.java:271) > at org.apache.nutch.parse.tika.TikaConfig.<init>(TikaConfig.java:149) > at > org.apache.nutch.parse.tika.TikaConfig.getDefaultConfig(TikaConfig.java:211) > at org.apache.nutch.parse.tika.TikaParser.setConf(TikaParser.java:254) > at > org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:162) > at > org.apache.nutch.parse.ParserFactory.getParsers(ParserFactory.java:132) > at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:71) > at org.apache.nutch.parse.ParserChecker.run(ParserChecker.java:101) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > at org.apache.nutch.parse.ParserChecker.main(ParserChecker.java:138) > > Nick told me to remove DWG from the org.apache.tika.parsers.Parsers config > file, which i did. But then other dependency issues come and go. The more > parsers i remove from the config file the better it goes, but then Tika won't > build anymore because of failing tests. > > I asked this on the Nutch list because i wasn't sure anymore how Nutch deals > with these its own deps, which you explained well. > > I'll give up for now :) > > > >> >> On 8 February 2012 13:03, Markus Jelsma <markus.jel...@openindex.io> wrote: >>> Yes, it looks like it! It should also be upgraded to Tika 1.0. But that's >>> something else. >>> >>> dependencies, dependencies, dependencies.... :( >>> >>> On Wednesday 08 February 2012 14:04:26 Julien Nioche wrote: >>>> The dependencies for the plugins are defined locally as shown in the >>>> URL below, where you can see the ref to tika-parsers for parse-tika. >>>> Is that more clear for you Markus? >>>> >>>> On 8 February 2012 12:58, Lewis John Mcgibbney >>> >>> <lewis.mcgibb...@gmail.com>wrote: >>>>> Hi Markus, >>>>> >>>>> For starters >>> >>> http://svn.apache.org/viewvc/nutch/trunk/src/plugin/parse-tika/ivy.xml?vi >>> >>>>> ew=markup >>>>> >>>>> Can we pick our way through this? >>>>> >>>>> Thanks >>>>> >>>>> >>>>> On Wed, Feb 8, 2012 at 12:50 PM, Markus Jelsma >>>>> <markus.jel...@openindex.io >>>>> >>>>>> wrote: >>>>>> Hi, >>>>>> >>>>>> Can anyone shed light on this? We don't have any parsers in our libs >>> >>> dir >>> >>>>>> and >>>>>> we don't have tika-parsers jar, only the tika-core jar. Where are >>>>>> the parsers >>>>>> and how does this all work? >>>>>> >>>>>> I've posted a question (same subject) on the Tika list and Nick >>>>>> tells >>> >>> me >>> >>>>>> there >>>>>> must be parsers somewhere. Well, i have no idea how we do it in >>>>>> Nutch, do you? >>>>>> >>>>>> Thanks >>>>> >>>>> -- >>>>> *Lewis* >>> >>> -- >>> Markus Jelsma - CTO - Openindex > > -- > Markus Jelsma - CTO - Openindex -------------------------- Ken Krugler http://www.scaleunlimited.com custom big data solutions & training Hadoop, Cascading, Mahout & Solr