[ 
https://issues.apache.org/jira/browse/NUTCH-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney resolved NUTCH-1609.
-----------------------------------------

    Resolution: Won't Fix

Thank you for reporting this.
There are however two issues here...
1) The Crawler class is deprecated and we advise strongly not to use it.
2) The SqlStore is deprecated and we advise strongly not to use it. There are 
alternatives out there and we would advise you very strongly to use these 
instead if possible.

                
>  java.net.MalformedURLException when running nutch crawl with 
> apache-nutch-2.1.jar with hadoop 
> -----------------------------------------------------------------------------------------------
>
>                 Key: NUTCH-1609
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1609
>             Project: Nutch
>          Issue Type: Bug
>         Environment: nutch 2.1
> hadoop 1.0.3
>            Reporter: vishal toshniwal
>
> I am getting   java.net.MalformedURLException  when running "crawl" for nutch 
> 2.1 with hadoop. But it is working fine with the local mode
> Following is the exception
> bin/hadoop jar apache-nutch-2.1.job org.apache.nutch.crawl.Crawler urls2 -dir 
> crawled -depth 3 -topN 5
> Warning: $HADOOP_HOME is deprecated.
> ****hdfs://localhost:9000/user/impadmin/crawled
> java.lang.RuntimeException: java.io.IOException: java.io.IOException: 
> java.net.MalformedURLException
>       at 
> org.apache.gora.mapreduce.GoraInputFormat.setConf(GoraInputFormat.java:115)
>       at 
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
>       at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
>       at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:723)
>       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>       at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:396)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
>       at org.apache.hadoop.mapred.Child.main(Child.java:249)
> Caused by: java.io.IOException: java.io.IOException: 
> java.net.MalformedURLException
>       at org.apache.gora.util.IOUtils.loadFromConf(IOUtils.java:483)
>       at 
> org.apache.gora.mapreduce.GoraInputFormat.getQuery(GoraInputFormat.java:125)
>       at 
> org.apache.gora.mapreduce.GoraInputFormat.setConf(GoraInputFormat.java:112)
>       ... 9 more
> Caused by: java.io.IOException: java.net.MalformedURLException
>       at org.apache.gora.sql.store.SqlStore.readMapping(SqlStore.java:878)
>       at org.apache.gora.sql.store.SqlStore.initialize(SqlStore.java:163)
>       at 
> org.apache.gora.store.impl.DataStoreBase.readFields(DataStoreBase.java:181)
>       at org.apache.gora.query.impl.QueryBase.readFields(QueryBase.java:222)
>       at 
> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
>       at 
> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
>       at 
> org.apache.hadoop.io.DefaultStringifier.fromString(DefaultStringifier.java:75)
>       at 
> org.apache.hadoop.io.DefaultStringifier.load(DefaultStringifier.java:133)
>       at org.apache.gora.util.IOUtils.loadFromConf(IOUtils.java:480)
>       ... 11 more
> Caused by: java.net.MalformedURLException
>       at java.net.URL.<init>(URL.java:601)
>       at java.net.URL.<init>(URL.java:464)
>       at java.net.URL.<init>(URL.java:413)
>       at org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity(Unknown 
> Source)
>       at 
> org.apache.xerces.impl.XMLVersionDetector.determineDocVersion(Unknown Source)
>       at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
>       at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
>       at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
>       at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
>       at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown 
> Source)
>       at org.jdom.input.SAXBuilder.build(SAXBuilder.java:489)
>       at org.jdom.input.SAXBuilder.build(SAXBuilder.java:807)
>       at org.apache.gora.sql.store.SqlStore.readMapping(SqlStore.java:847)
>       ... 19 more
> Exception in thread "main" java.lang.RuntimeException: job failed: 
> name=generate: 1373549310-1607767962, jobid=job_201307111857_0002
>       at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:54)
>       at org.apache.nutch.crawl.GeneratorJob.run(GeneratorJob.java:191)
>       at org.apache.nutch.crawl.Crawler.runTool(Crawler.java:68)
>       at org.apache.nutch.crawl.Crawler.run(Crawler.java:152)
>       at org.apache.nutch.crawl.Crawler.run(Crawler.java:250)
>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>       at org.apache.nutch.crawl.Crawler.main(Crawler.java:257)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>       at java.lang.reflect.Method.invoke(Method.java:597)
>       at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> bin/nutch crawl urls -depth 3 -topN 5

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to