[
https://issues.apache.org/jira/browse/NUTCH-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14703291#comment-14703291
]
Asitang Mishra commented on NUTCH-1486:
---------------------------------------
Hey Lewis,
Your fix for the jar soup did not work for the Naive bayes plugin. It was not
able to find classes. Here is what I got:
java.lang.Exception: java.lang.RuntimeException:
java.lang.ClassNotFoundException:
org.apache.mahout.vectorizer.document.SequenceFileTokenizerMapper
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException:
org.apache.mahout.vectorizer.document.SequenceFileTokenizerMapper
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:857)
at
org.apache.hadoop.mapreduce.JobContext.getMapperClass(JobContext.java:199)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:718)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
at
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException:
org.apache.mahout.vectorizer.document.SequenceFileTokenizerMapper
at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:340)
at
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:810)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:855)
... 9 more
2015-08-19 09:27:41,936 ERROR naivebayes.NaiveBayesParseFilter - Error occured
while training:: java.lang.IllegalStateException: Job failed!
at
org.apache.mahout.vectorizer.DocumentProcessor.tokenizeDocuments(DocumentProcessor.java:95)
at
org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.run(SparseVectorsFromSequenceFiles.java:257)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at
org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.main(SparseVectorsFromSequenceFiles.java:56)
at
org.apache.nutch.parsefilter.naivebayes.NaiveBayesClassifier.createModel(NaiveBayesClassifier.java:99)
at
org.apache.nutch.parsefilter.naivebayes.NaiveBayesParseFilter.train(NaiveBayesParseFilter.java:93)
at
org.apache.nutch.parsefilter.naivebayes.NaiveBayesParseFilter.setConf(NaiveBayesParseFilter.java:148)
at
org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:163)
at
org.apache.nutch.plugin.PluginRepository.getOrderedPlugins(PluginRepository.java:441)
at
org.apache.nutch.parse.HtmlParseFilters.<init>(HtmlParseFilters.java:35)
at org.apache.nutch.parse.html.HtmlParser.setConf(HtmlParser.java:343)
at
org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:163)
at
org.apache.nutch.parse.ParserFactory.getParsers(ParserFactory.java:136)
at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:78)
at org.apache.nutch.parse.ParseSegment.map(ParseSegment.java:104)
at org.apache.nutch.parse.ParseSegment.map(ParseSegment.java:46)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
at
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
> Upgrade to Solr 4.10.2
> ----------------------
>
> Key: NUTCH-1486
> URL: https://issues.apache.org/jira/browse/NUTCH-1486
> Project: Nutch
> Issue Type: Improvement
> Affects Versions: 1.6, 2.1
> Environment: Solr 4.0, Nutch trunk 1.6-SNAPSHOT & Probably 2.2-SNAPHOT
> Reporter: Lewis John McGibbney
> Assignee: Lewis John McGibbney
> Labels: memex
> Fix For: 1.11
>
> Attachments: NUTCH-1486-1.8.patch, NUTCH-1486-1.9-trunk.patch,
> NUTCH-1486-2.x-v3.patch, NUTCH-1486-2.x.patch, NUTCH-1486-2.x.v2.patch,
> NUTCH-1486-nutchgora.patch, NUTCH-1486-trunk.patch,
> NUTCH-1486-trunk.v2.patch, NUTCH-1486-trunk.v3.patch,
> NUTCH-1486-trunkv4.patch, NUTCH-1486-trunkv5.patch
>
>
> When attempting to configure a 4 multicore 4.0 instance with Nutch
> schema-solr4.xml file, I get the following exceptions.
> This has been discussed previously. As I see it we have two options
> 1. Keep maintaining both schema options
> 2. Ditch the more complex schema-solr4.xml in favour of vanilla schema.xml
> Thoughts?
> {code}
> SEVERE: Unable to create core: collection4
> org.apache.solr.common.SolrException: Unable to use updateLog: _version_field
> must exist in schema, using indexed="true" stored="true" and
> multiValued="false" (_version_ does not exist)
> at org.apache.solr.core.SolrCore.<init>(SolrCore.java:721)
> at org.apache.solr.core.SolrCore.<init>(SolrCore.java:566)
> at org.apache.solr.core.CoreContainer.create(CoreContainer.java:850)
> at org.apache.solr.core.CoreContainer.load(CoreContainer.java:534)
> at org.apache.solr.core.CoreContainer.load(CoreContainer.java:356)
> at
> org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:308)
> at
> org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:107)
> at org.eclipse.jetty.servlet.FilterHolder.doStart(FilterHolder.java:114)
> at
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59)
> at
> org.eclipse.jetty.servlet.ServletHandler.initialize(ServletHandler.java:754)
> at
> org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:258)
> at
> org.eclipse.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1221)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:699)
> at
> org.eclipse.jetty.webapp.WebAppContext.doStart(WebAppContext.java:454)
> at
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59)
> at
> org.eclipse.jetty.deploy.bindings.StandardStarter.processBinding(StandardStarter.java:36)
> at
> org.eclipse.jetty.deploy.AppLifeCycle.runBindings(AppLifeCycle.java:183)
> at
> org.eclipse.jetty.deploy.DeploymentManager.requestAppGoal(DeploymentManager.java:491)
> at
> org.eclipse.jetty.deploy.DeploymentManager.addApp(DeploymentManager.java:138)
> at
> org.eclipse.jetty.deploy.providers.ScanningAppProvider.fileAdded(ScanningAppProvider.java:142)
> at
> org.eclipse.jetty.deploy.providers.ScanningAppProvider$1.fileAdded(ScanningAppProvider.java:53)
> at org.eclipse.jetty.util.Scanner.reportAddition(Scanner.java:604)
> at org.eclipse.jetty.util.Scanner.reportDifferences(Scanner.java:535)
> at org.eclipse.jetty.util.Scanner.scan(Scanner.java:398)
> at org.eclipse.jetty.util.Scanner.doStart(Scanner.java:332)
> at
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59)
> at
> org.eclipse.jetty.deploy.providers.ScanningAppProvider.doStart(ScanningAppProvider.java:118)
> at
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59)
> at
> org.eclipse.jetty.deploy.DeploymentManager.startAppProvider(DeploymentManager.java:552)
> at
> org.eclipse.jetty.deploy.DeploymentManager.doStart(DeploymentManager.java:227)
> at
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59)
> at
> org.eclipse.jetty.util.component.AggregateLifeCycle.doStart(AggregateLifeCycle.java:63)
> at
> org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:53)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.doStart(HandlerWrapper.java:91)
> at org.eclipse.jetty.server.Server.doStart(Server.java:263)
> at
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59)
> at
> org.eclipse.jetty.xml.XmlConfiguration$1.run(XmlConfiguration.java:1215)
> at java.security.AccessController.doPrivileged(Native Method)
> at
> org.eclipse.jetty.xml.XmlConfiguration.main(XmlConfiguration.java:1138)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.eclipse.jetty.start.Main.invokeMain(Main.java:457)
> at org.eclipse.jetty.start.Main.start(Main.java:602)
> at org.eclipse.jetty.start.Main.main(Main.java:82)
> Caused by: org.apache.solr.common.SolrException: Unable to use updateLog:
> _version_field must exist in schema, using indexed="true" stored="true" and
> multiValued="false" (_version_ does not exist)
> at org.apache.solr.update.UpdateLog.init(UpdateLog.java:236)
> at org.apache.solr.update.UpdateHandler.initLog(UpdateHandler.java:94)
> at org.apache.solr.update.UpdateHandler.<init>(UpdateHandler.java:123)
> at
> org.apache.solr.update.DirectUpdateHandler2.<init>(DirectUpdateHandler2.java:97)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
> at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:476)
> at org.apache.solr.core.SolrCore.createUpdateHandler(SolrCore.java:544)
> at org.apache.solr.core.SolrCore.<init>(SolrCore.java:705)
> ... 45 more
> Caused by: org.apache.solr.common.SolrException: _version_field must exist in
> schema, using indexed="true" stored="true" and multiValued="false" (_version_
> does not exist)
> at
> org.apache.solr.update.VersionInfo.getAndCheckVersionField(VersionInfo.java:57)
> at org.apache.solr.update.VersionInfo.<init>(VersionInfo.java:83)
> at org.apache.solr.update.UpdateLog.init(UpdateLog.java:233)
> ... 55 more
> 01-Nov-2012 16:26:15 org.apache.solr.common.SolrException log
> SEVERE: null:org.apache.solr.common.SolrException: Unable to use updateLog:
> _version_field must exist in schema, using indexed="true" stored="true" and
> multiValued="false" (_version_ does not exist)
> at org.apache.solr.core.SolrCore.<init>(SolrCore.java:721)
> at org.apache.solr.core.SolrCore.<init>(SolrCore.java:566)
> at org.apache.solr.core.CoreContainer.create(CoreContainer.java:850)
> at org.apache.solr.core.CoreContainer.load(CoreContainer.java:534)
> at org.apache.solr.core.CoreContainer.load(CoreContainer.java:356)
> at
> org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:308)
> at
> org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:107)
> at org.eclipse.jetty.servlet.FilterHolder.doStart(FilterHolder.java:114)
> at
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59)
> at
> org.eclipse.jetty.servlet.ServletHandler.initialize(ServletHandler.java:754)
> at
> org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:258)
> at
> org.eclipse.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1221)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:699)
> at
> org.eclipse.jetty.webapp.WebAppContext.doStart(WebAppContext.java:454)
> at
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59)
> at
> org.eclipse.jetty.deploy.bindings.StandardStarter.processBinding(StandardStarter.java:36)
> at
> org.eclipse.jetty.deploy.AppLifeCycle.runBindings(AppLifeCycle.java:183)
> at
> org.eclipse.jetty.deploy.DeploymentManager.requestAppGoal(DeploymentManager.java:491)
> at
> org.eclipse.jetty.deploy.DeploymentManager.addApp(DeploymentManager.java:138)
> at
> org.eclipse.jetty.deploy.providers.ScanningAppProvider.fileAdded(ScanningAppProvider.java:142)
> at
> org.eclipse.jetty.deploy.providers.ScanningAppProvider$1.fileAdded(ScanningAppProvider.java:53)
> at org.eclipse.jetty.util.Scanner.reportAddition(Scanner.java:604)
> at org.eclipse.jetty.util.Scanner.reportDifferences(Scanner.java:535)
> at org.eclipse.jetty.util.Scanner.scan(Scanner.java:398)
> at org.eclipse.jetty.util.Scanner.doStart(Scanner.java:332)
> at
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59)
> at
> org.eclipse.jetty.deploy.providers.ScanningAppProvider.doStart(ScanningAppProvider.java:118)
> at
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59)
> at
> org.eclipse.jetty.deploy.DeploymentManager.startAppProvider(DeploymentManager.java:552)
> at
> org.eclipse.jetty.deploy.DeploymentManager.doStart(DeploymentManager.java:227)
> at
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59)
> at
> org.eclipse.jetty.util.component.AggregateLifeCycle.doStart(AggregateLifeCycle.java:63)
> at
> org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:53)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.doStart(HandlerWrapper.java:91)
> at org.eclipse.jetty.server.Server.doStart(Server.java:263)
> at
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59)
> at
> org.eclipse.jetty.xml.XmlConfiguration$1.run(XmlConfiguration.java:1215)
> at java.security.AccessController.doPrivileged(Native Method)
> at
> org.eclipse.jetty.xml.XmlConfiguration.main(XmlConfiguration.java:1138)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.eclipse.jetty.start.Main.invokeMain(Main.java:457)
> at org.eclipse.jetty.start.Main.start(Main.java:602)
> at org.eclipse.jetty.start.Main.main(Main.java:82)
> Caused by: org.apache.solr.common.SolrException: Unable to use updateLog:
> _version_field must exist in schema, using indexed="true" stored="true" and
> multiValued="false" (_version_ does not exist)
> at org.apache.solr.update.UpdateLog.init(UpdateLog.java:236)
> at org.apache.solr.update.UpdateHandler.initLog(UpdateHandler.java:94)
> at org.apache.solr.update.UpdateHandler.<init>(UpdateHandler.java:123)
> at
> org.apache.solr.update.DirectUpdateHandler2.<init>(DirectUpdateHandler2.java:97)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
> at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:476)
> at org.apache.solr.core.SolrCore.createUpdateHandler(SolrCore.java:544)
> at org.apache.solr.core.SolrCore.<init>(SolrCore.java:705)
> ... 45 more
> Caused by: org.apache.solr.common.SolrException: _version_field must exist in
> schema, using indexed="true" stored="true" and multiValued="false" (_version_
> does not exist)
> at
> org.apache.solr.update.VersionInfo.getAndCheckVersionField(VersionInfo.java:57)
> at org.apache.solr.update.VersionInfo.<init>(VersionInfo.java:83)
> at org.apache.solr.update.UpdateLog.init(UpdateLog.java:233)
> ... 55 more
> 01-Nov-2012 16:26:15 org.apache.solr.servlet.SolrDispatchFilter init
> INFO: user.dir=/home/lewis/ASF/solr/example
> 01-Nov-2012 16:26:15 org.apache.solr.servlet.SolrDispatchFilter init
> INFO: SolrDispatchFilter.init() done
> 2012-11-01 16:26:15.228:INFO:oejs.AbstractConnector:Started
> [email protected]:8983
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)