vishal toshniwal created NUTCH-1609:
---------------------------------------
Summary: java.net.MalformedURLException when running nutch crawl
with apache-nutch-2.1.jar with hadoop
Key: NUTCH-1609
URL: https://issues.apache.org/jira/browse/NUTCH-1609
Project: Nutch
Issue Type: Bug
Environment: nutch 2.1
hadoop 1.0.3
Reporter: vishal toshniwal
I am getting java.net.MalformedURLException when running "crawl" for nutch
2.1 with hadoop. But it is working fine with the local mode
Following is the exception
bin/hadoop jar apache-nutch-2.1.job org.apache.nutch.crawl.Crawler urls2 -dir
crawled -depth 3 -topN 5
Warning: $HADOOP_HOME is deprecated.
****hdfs://localhost:9000/user/impadmin/crawled
java.lang.RuntimeException: java.io.IOException: java.io.IOException:
java.net.MalformedURLException
at
org.apache.gora.mapreduce.GoraInputFormat.setConf(GoraInputFormat.java:115)
at
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:723)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.io.IOException: java.io.IOException:
java.net.MalformedURLException
at org.apache.gora.util.IOUtils.loadFromConf(IOUtils.java:483)
at
org.apache.gora.mapreduce.GoraInputFormat.getQuery(GoraInputFormat.java:125)
at
org.apache.gora.mapreduce.GoraInputFormat.setConf(GoraInputFormat.java:112)
... 9 more
Caused by: java.io.IOException: java.net.MalformedURLException
at org.apache.gora.sql.store.SqlStore.readMapping(SqlStore.java:878)
at org.apache.gora.sql.store.SqlStore.initialize(SqlStore.java:163)
at
org.apache.gora.store.impl.DataStoreBase.readFields(DataStoreBase.java:181)
at org.apache.gora.query.impl.QueryBase.readFields(QueryBase.java:222)
at
org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
at
org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
at
org.apache.hadoop.io.DefaultStringifier.fromString(DefaultStringifier.java:75)
at
org.apache.hadoop.io.DefaultStringifier.load(DefaultStringifier.java:133)
at org.apache.gora.util.IOUtils.loadFromConf(IOUtils.java:480)
... 11 more
Caused by: java.net.MalformedURLException
at java.net.URL.<init>(URL.java:601)
at java.net.URL.<init>(URL.java:464)
at java.net.URL.<init>(URL.java:413)
at org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity(Unknown
Source)
at
org.apache.xerces.impl.XMLVersionDetector.determineDocVersion(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown
Source)
at org.jdom.input.SAXBuilder.build(SAXBuilder.java:489)
at org.jdom.input.SAXBuilder.build(SAXBuilder.java:807)
at org.apache.gora.sql.store.SqlStore.readMapping(SqlStore.java:847)
... 19 more
Exception in thread "main" java.lang.RuntimeException: job failed:
name=generate: 1373549310-1607767962, jobid=job_201307111857_0002
at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:54)
at org.apache.nutch.crawl.GeneratorJob.run(GeneratorJob.java:191)
at org.apache.nutch.crawl.Crawler.runTool(Crawler.java:68)
at org.apache.nutch.crawl.Crawler.run(Crawler.java:152)
at org.apache.nutch.crawl.Crawler.run(Crawler.java:250)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.crawl.Crawler.main(Crawler.java:257)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
bin/nutch crawl urls -depth 3 -topN 5
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira