[
https://issues.apache.org/jira/browse/NUTCH-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13707955#comment-13707955
]
vishal toshniwal commented on NUTCH-1609:
-----------------------------------------
We also tried to integrate with Hbase 9.0.3 but we are getting the same
"java.net.MalformedURLException" with that
> java.net.MalformedURLException when running nutch crawl with
> apache-nutch-2.1.jar with hadoop
> -----------------------------------------------------------------------------------------------
>
> Key: NUTCH-1609
> URL: https://issues.apache.org/jira/browse/NUTCH-1609
> Project: Nutch
> Issue Type: Bug
> Environment: nutch 2.1
> hadoop 1.0.3
> Reporter: vishal toshniwal
>
> I am getting java.net.MalformedURLException when running "crawl" for nutch
> 2.1 with hadoop. But it is working fine with the local mode
> Following is the exception
> bin/hadoop jar apache-nutch-2.1.job org.apache.nutch.crawl.Crawler urls2 -dir
> crawled -depth 3 -topN 5
> Warning: $HADOOP_HOME is deprecated.
> ****hdfs://localhost:9000/user/impadmin/crawled
> java.lang.RuntimeException: java.io.IOException: java.io.IOException:
> java.net.MalformedURLException
> at
> org.apache.gora.mapreduce.GoraInputFormat.setConf(GoraInputFormat.java:115)
> at
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
> at
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:723)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
> at org.apache.hadoop.mapred.Child.main(Child.java:249)
> Caused by: java.io.IOException: java.io.IOException:
> java.net.MalformedURLException
> at org.apache.gora.util.IOUtils.loadFromConf(IOUtils.java:483)
> at
> org.apache.gora.mapreduce.GoraInputFormat.getQuery(GoraInputFormat.java:125)
> at
> org.apache.gora.mapreduce.GoraInputFormat.setConf(GoraInputFormat.java:112)
> ... 9 more
> Caused by: java.io.IOException: java.net.MalformedURLException
> at org.apache.gora.sql.store.SqlStore.readMapping(SqlStore.java:878)
> at org.apache.gora.sql.store.SqlStore.initialize(SqlStore.java:163)
> at
> org.apache.gora.store.impl.DataStoreBase.readFields(DataStoreBase.java:181)
> at org.apache.gora.query.impl.QueryBase.readFields(QueryBase.java:222)
> at
> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
> at
> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
> at
> org.apache.hadoop.io.DefaultStringifier.fromString(DefaultStringifier.java:75)
> at
> org.apache.hadoop.io.DefaultStringifier.load(DefaultStringifier.java:133)
> at org.apache.gora.util.IOUtils.loadFromConf(IOUtils.java:480)
> ... 11 more
> Caused by: java.net.MalformedURLException
> at java.net.URL.<init>(URL.java:601)
> at java.net.URL.<init>(URL.java:464)
> at java.net.URL.<init>(URL.java:413)
> at org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity(Unknown
> Source)
> at
> org.apache.xerces.impl.XMLVersionDetector.determineDocVersion(Unknown Source)
> at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
> at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
> at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
> at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
> at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown
> Source)
> at org.jdom.input.SAXBuilder.build(SAXBuilder.java:489)
> at org.jdom.input.SAXBuilder.build(SAXBuilder.java:807)
> at org.apache.gora.sql.store.SqlStore.readMapping(SqlStore.java:847)
> ... 19 more
> Exception in thread "main" java.lang.RuntimeException: job failed:
> name=generate: 1373549310-1607767962, jobid=job_201307111857_0002
> at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:54)
> at org.apache.nutch.crawl.GeneratorJob.run(GeneratorJob.java:191)
> at org.apache.nutch.crawl.Crawler.runTool(Crawler.java:68)
> at org.apache.nutch.crawl.Crawler.run(Crawler.java:152)
> at org.apache.nutch.crawl.Crawler.run(Crawler.java:250)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at org.apache.nutch.crawl.Crawler.main(Crawler.java:257)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> bin/nutch crawl urls -depth 3 -topN 5
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira