[ 
https://issues.apache.org/jira/browse/NUTCH-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13710352#comment-13710352
 ] 

Lewis John McGibbney commented on NUTCH-1609:
---------------------------------------------

You need to use the bin script or the crawl script. The latter combines 
arguments from the former.
If you have problems with Gora in the stack, please head over to user or 
[email protected] and the Gora community can try to help out over there.
                
>  java.net.MalformedURLException when running nutch crawl with 
> apache-nutch-2.1.jar with hadoop 
> -----------------------------------------------------------------------------------------------
>
>                 Key: NUTCH-1609
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1609
>             Project: Nutch
>          Issue Type: Bug
>         Environment: nutch 2.1
> hadoop 1.0.3
>            Reporter: vishal toshniwal
>
> I am getting   java.net.MalformedURLException  when running "crawl" for nutch 
> 2.1 with hadoop. But it is working fine with the local mode
> Following is the exception
> bin/hadoop jar apache-nutch-2.1.job org.apache.nutch.crawl.Crawler urls2 -dir 
> crawled -depth 3 -topN 5
> Warning: $HADOOP_HOME is deprecated.
> ****hdfs://localhost:9000/user/impadmin/crawled
> java.lang.RuntimeException: java.io.IOException: java.io.IOException: 
> java.net.MalformedURLException
>       at 
> org.apache.gora.mapreduce.GoraInputFormat.setConf(GoraInputFormat.java:115)
>       at 
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
>       at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
>       at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:723)
>       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>       at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:396)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
>       at org.apache.hadoop.mapred.Child.main(Child.java:249)
> Caused by: java.io.IOException: java.io.IOException: 
> java.net.MalformedURLException
>       at org.apache.gora.util.IOUtils.loadFromConf(IOUtils.java:483)
>       at 
> org.apache.gora.mapreduce.GoraInputFormat.getQuery(GoraInputFormat.java:125)
>       at 
> org.apache.gora.mapreduce.GoraInputFormat.setConf(GoraInputFormat.java:112)
>       ... 9 more
> Caused by: java.io.IOException: java.net.MalformedURLException
>       at org.apache.gora.sql.store.SqlStore.readMapping(SqlStore.java:878)
>       at org.apache.gora.sql.store.SqlStore.initialize(SqlStore.java:163)
>       at 
> org.apache.gora.store.impl.DataStoreBase.readFields(DataStoreBase.java:181)
>       at org.apache.gora.query.impl.QueryBase.readFields(QueryBase.java:222)
>       at 
> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
>       at 
> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
>       at 
> org.apache.hadoop.io.DefaultStringifier.fromString(DefaultStringifier.java:75)
>       at 
> org.apache.hadoop.io.DefaultStringifier.load(DefaultStringifier.java:133)
>       at org.apache.gora.util.IOUtils.loadFromConf(IOUtils.java:480)
>       ... 11 more
> Caused by: java.net.MalformedURLException
>       at java.net.URL.<init>(URL.java:601)
>       at java.net.URL.<init>(URL.java:464)
>       at java.net.URL.<init>(URL.java:413)
>       at org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity(Unknown 
> Source)
>       at 
> org.apache.xerces.impl.XMLVersionDetector.determineDocVersion(Unknown Source)
>       at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
>       at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
>       at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
>       at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
>       at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown 
> Source)
>       at org.jdom.input.SAXBuilder.build(SAXBuilder.java:489)
>       at org.jdom.input.SAXBuilder.build(SAXBuilder.java:807)
>       at org.apache.gora.sql.store.SqlStore.readMapping(SqlStore.java:847)
>       ... 19 more
> Exception in thread "main" java.lang.RuntimeException: job failed: 
> name=generate: 1373549310-1607767962, jobid=job_201307111857_0002
>       at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:54)
>       at org.apache.nutch.crawl.GeneratorJob.run(GeneratorJob.java:191)
>       at org.apache.nutch.crawl.Crawler.runTool(Crawler.java:68)
>       at org.apache.nutch.crawl.Crawler.run(Crawler.java:152)
>       at org.apache.nutch.crawl.Crawler.run(Crawler.java:250)
>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>       at org.apache.nutch.crawl.Crawler.main(Crawler.java:257)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>       at java.lang.reflect.Method.invoke(Method.java:597)
>       at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> bin/nutch crawl urls -depth 3 -topN 5

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to