Great...maybe this is a bug in the Tika codebase! On Thu, Nov 20, 2014 at 10:02 AM, MengYing Wang <mengyingwa...@gmail.com> wrote:
> Dear Lewis, > > Problem solved by replacing the rome-1.0.jar back to rome-0.9.jar in > parse-tika. > Same idea as the feed parser in > https://issues.apache.org/jira/browse/NUTCH-1494. Thanks. > > Best, > Mengying (Angela) Wang > > On Wed, Nov 19, 2014 at 9:08 PM, Lewis John Mcgibbney < > lewis.mcgibb...@gmail.com> wrote: > >> Try removing 0.9 from that directory (copy elsewhere) and attempt to re >> parse the directory. >> Thanks >> >> On Wed, Nov 19, 2014 at 8:36 PM, MengYing Wang <mengyingwa...@gmail.com> >> wrote: >> >>> Dear Lewis, >>> >>> In feed, it is rome-0.9 ( >>> http://svn.apache.org/repos/asf/nutch/trunk/src/plugin/feed/ivy.xml). >>> While, in parse-Tika, it is rome-1.0 ( >>> http://svn.apache.org/repos/asf/nutch/trunk/src/plugin/parse-tika/plugin.xml). >>> I have enabled both feed and parse-tika in the nutch-site.xml. Thanks. >>> >>> Best, >>> Mengying (Angela) Wang >>> >>> >>> >>> On Wed, Nov 19, 2014 at 8:42 AM, Lewis John Mcgibbney < >>> lewis.mcgibb...@gmail.com> wrote: >>> >>>> Which version of Rome feed parser is in your class path? >>>> It may be activated via the Nutch 'feed' plugin or may also be come via >>>> Nutch 'parse-Tika' plugin. >>>> Please determine which version(s) are in class path and which are being >>>> used. >>>> >>>> On Wednesday, November 19, 2014, MengYing Wang <mengyingwa...@gmail.com> >>>> wrote: >>>> >>>>> Hi Everyone, >>>>> >>>>> In the Nutch parse step, I received the following error. Does Anyone >>>>> know how to solve the problem? Appreciate for your help! >>>>> >>>>> $ /cygdrive/d/nutch_trunk/runtime/local/bin/nutch parse -D >>>>> mapred.reduce.tasks=2 -D mapred.child.java.opts=-Xmx1000m -D >>>>> mapred.reduce.tasks.speculative.execution=false -D >>>>> mapred.map.tasks.speculative.execution=false -D >>>>> mapred.compress.map.output=true -D >>>>> mapred.skip.attempts.to.start.skipping=2 >>>>> -D mapred.skip.map.max.skip.records=1 crawlId/segments/20141118235323 >>>>> >>>>> java.lang.ExceptionInInitializerError >>>>> at com.sun.syndication.io.SyndFeedInput.build(SyndFeedInput.java:136) >>>>> at org.apache.tika.parser.feed.FeedParser.parse(FeedParser.java:70) >>>>> at org.apache.nutch.parse.tika.TikaParser.getParse(TikaParser.java:103) >>>>> at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:95) >>>>> at org.apache.nutch.parse.ParseSegment.map(ParseSegment.java:101) >>>>> at org.apache.nutch.parse.ParseSegment.map(ParseSegment.java:44) >>>>> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) >>>>> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) >>>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) >>>>> at >>>>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177) >>>>> Caused by: java.lang.NullPointerException >>>>> at java.util.Properties$LineReader.readLine(Properties.java:434) >>>>> at java.util.Properties.load0(Properties.java:353) >>>>> at java.util.Properties.load(Properties.java:341) >>>>> at >>>>> com.sun.syndication.io.impl.PropertiesLoader.<init>(PropertiesLoader.java:74) >>>>> at >>>>> com.sun.syndication.io.impl.PropertiesLoader.getPropertiesLoader(PropertiesLoader.java:46) >>>>> at >>>>> com.sun.syndication.io.impl.PluginManager.<init>(PluginManager.java:54) >>>>> at >>>>> com.sun.syndication.io.impl.PluginManager.<init>(PluginManager.java:46) >>>>> at >>>>> com.sun.syndication.feed.synd.impl.Converters.<init>(Converters.java:40) >>>>> at >>>>> com.sun.syndication.feed.synd.SyndFeedImpl.<clinit>(SyndFeedImpl.java:59) >>>>> ... 10 more >>>>> >>>>> -- >>>>> Best, >>>>> Mengying (Angela) Wang >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "nsf-polar-usc-students" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to nsf-polar-usc-students+unsubscr...@googlegroups.com. >>>>> To post to this group, send email to >>>>> nsf-polar-usc-stude...@googlegroups.com. >>>>> Visit this group at >>>>> http://groups.google.com/group/nsf-polar-usc-students. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/nsf-polar-usc-students/CAJX%3DLAuzcTtYe61Avq1EthNRYN6M-%2BGk%2B7PntdOYvQ4ZkrEJKw%40mail.gmail.com >>>>> <https://groups.google.com/d/msgid/nsf-polar-usc-students/CAJX%3DLAuzcTtYe61Avq1EthNRYN6M-%2BGk%2B7PntdOYvQ4ZkrEJKw%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>> >>>> >>>> -- >>>> *Lewis* >>>> >>>> >>> >>> >>> -- >>> Best, >>> Mengying (Angela) Wang >>> >> >> >> >> -- >> *Lewis* >> > > > > -- > Best, > Mengying (Angela) Wang > -- *Lewis*