[
https://issues.apache.org/jira/browse/NUTCH-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lewis John McGibbney resolved NUTCH-1609.
-----------------------------------------
Resolution: Won't Fix
Thank you for reporting this.
There are however two issues here...
1) The Crawler class is deprecated and we advise strongly not to use it.
2) The SqlStore is deprecated and we advise strongly not to use it. There are
alternatives out there and we would advise you very strongly to use these
instead if possible.
> java.net.MalformedURLException when running nutch crawl with
> apache-nutch-2.1.jar with hadoop
> -----------------------------------------------------------------------------------------------
>
> Key: NUTCH-1609
> URL: https://issues.apache.org/jira/browse/NUTCH-1609
> Project: Nutch
> Issue Type: Bug
> Environment: nutch 2.1
> hadoop 1.0.3
> Reporter: vishal toshniwal
>
> I am getting java.net.MalformedURLException when running "crawl" for nutch
> 2.1 with hadoop. But it is working fine with the local mode
> Following is the exception
> bin/hadoop jar apache-nutch-2.1.job org.apache.nutch.crawl.Crawler urls2 -dir
> crawled -depth 3 -topN 5
> Warning: $HADOOP_HOME is deprecated.
> ****hdfs://localhost:9000/user/impadmin/crawled
> java.lang.RuntimeException: java.io.IOException: java.io.IOException:
> java.net.MalformedURLException
> at
> org.apache.gora.mapreduce.GoraInputFormat.setConf(GoraInputFormat.java:115)
> at
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
> at
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:723)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
> at org.apache.hadoop.mapred.Child.main(Child.java:249)
> Caused by: java.io.IOException: java.io.IOException:
> java.net.MalformedURLException
> at org.apache.gora.util.IOUtils.loadFromConf(IOUtils.java:483)
> at
> org.apache.gora.mapreduce.GoraInputFormat.getQuery(GoraInputFormat.java:125)
> at
> org.apache.gora.mapreduce.GoraInputFormat.setConf(GoraInputFormat.java:112)
> ... 9 more
> Caused by: java.io.IOException: java.net.MalformedURLException
> at org.apache.gora.sql.store.SqlStore.readMapping(SqlStore.java:878)
> at org.apache.gora.sql.store.SqlStore.initialize(SqlStore.java:163)
> at
> org.apache.gora.store.impl.DataStoreBase.readFields(DataStoreBase.java:181)
> at org.apache.gora.query.impl.QueryBase.readFields(QueryBase.java:222)
> at
> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
> at
> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
> at
> org.apache.hadoop.io.DefaultStringifier.fromString(DefaultStringifier.java:75)
> at
> org.apache.hadoop.io.DefaultStringifier.load(DefaultStringifier.java:133)
> at org.apache.gora.util.IOUtils.loadFromConf(IOUtils.java:480)
> ... 11 more
> Caused by: java.net.MalformedURLException
> at java.net.URL.<init>(URL.java:601)
> at java.net.URL.<init>(URL.java:464)
> at java.net.URL.<init>(URL.java:413)
> at org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity(Unknown
> Source)
> at
> org.apache.xerces.impl.XMLVersionDetector.determineDocVersion(Unknown Source)
> at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
> at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
> at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
> at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
> at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown
> Source)
> at org.jdom.input.SAXBuilder.build(SAXBuilder.java:489)
> at org.jdom.input.SAXBuilder.build(SAXBuilder.java:807)
> at org.apache.gora.sql.store.SqlStore.readMapping(SqlStore.java:847)
> ... 19 more
> Exception in thread "main" java.lang.RuntimeException: job failed:
> name=generate: 1373549310-1607767962, jobid=job_201307111857_0002
> at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:54)
> at org.apache.nutch.crawl.GeneratorJob.run(GeneratorJob.java:191)
> at org.apache.nutch.crawl.Crawler.runTool(Crawler.java:68)
> at org.apache.nutch.crawl.Crawler.run(Crawler.java:152)
> at org.apache.nutch.crawl.Crawler.run(Crawler.java:250)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at org.apache.nutch.crawl.Crawler.main(Crawler.java:257)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> bin/nutch crawl urls -depth 3 -topN 5
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira