Thanks. As I expected, the problem is that the XML parser tries to download a schema or DTD from www.w3.org, probably to validate the XML returned by the local MediaWiki.
I'd like to look into it, but I don't know if I'll have time, so any help is welcome. This discussion may help: http://stackoverflow.com/questions/6539051/how-can-i-tell-xalan-not-to-validate-xml-retreived-using-the-document-function Cheers, JC On Mon, Mar 11, 2013 at 5:37 AM, Riko Adi Prasetya <[email protected]> wrote: > Hi Jona, > > Sorry for the late reply. > I have attached error log. > > Regards, > Riko > > ________________________________ > Dari: Jona Christopher Sahnwaldt <[email protected]> > > Kepada: Riko Adi Prasetya <[email protected]> > Cc: Dimitris Kontokostas <[email protected]>; > "[email protected]" > <[email protected]>; Jose Emilio Labra Gayo > <[email protected]> > Dikirim: Kamis, 7 Maret 2013 18:01 > > Judul: Re: [Dbpedia-discussion] Abstract extraction problem > > Hi Riko, > >> - java.net.UnknownHostException: www.w3.org > This is weird. Could you please send us the whole stack trace? I don't > think the extraction framework should try to access anything but > localhost. Could be some kind of XML schema thing. If it is, we should > probably turn it off. > > I still don't quite understand why you have to tell your JVM not to > use a proxy for localhost. I guess the JVM picks up the proxy > configuration from the operating system. Maybe you should configure > the OS such that no proxy is used for localhost. > > Cheers, > JC > > On Thu, Mar 7, 2013 at 11:36 AM, Riko Adi Prasetya > <[email protected]> wrote: >> Hi Dimitris, >> >> I use my campus' internet connection that must use proxy. So, i must >> configure it in extraction-framework/dump/pom.xml. >> I configure it like this, >> <launcher> >> <id>extraction</id> >> >> <mainClass>org.dbpedia.extraction.dump.extract.Extraction</mainClass> >> <jvmArgs> >> <jvmArg>-server</jvmArg> >> <jvmArg>-Xmx1024m</jvmArg> >> >> <jvmArg>-Dhttp.proxyHost=152.118.24.10</jvmArg> >> <jvmArg>-Dhttp.proxyPort=8080</jvmArg> >> >> <jvmArg>-Dhttp.nonProxyHosts="localhost|152.118.*.*|*.ui.ac.id"</jvmArg> >> </jvmArgs> >> </launcher> >> >> Before I solved this problem, I found some kind of message error : >> - java.net.ConnectException: Connection timed out >> - java.net.SocketException: Invalid argument or cannot assign requested >> address >> - java.net.UnknownHostException: www.w3.org >> - java.lang.Exception: Could not retrieve abstract for page: title=Daftar >> filsuf;ns=0/Main/;language:wiki=id,locale=in >> >> I have sent pull request. >> >> Thank you Dimitris and Jona >> >> Regards, >> Riko >> >> ________________________________ >> Dari: Dimitris Kontokostas <[email protected]> >> Kepada: Riko Adi Prasetya <[email protected]> >> Cc: Jona Sahnwaldt <[email protected]>; >> "[email protected]" >> <[email protected]>; Jose Emilio Labra Gayo >> <[email protected]> >> Dikirim: Selasa, 5 Maret 2013 23:09 >> >> Judul: Re: [Dbpedia-discussion] Abstract extraction problem >> >> Hi Riko, >> >> We had similar (proxy) problems in the past but we didn't documented them >> anywhere.Would you mind writing how you bypassed the proxy issue? >> >> You could make a pull request with your proxy-pom configuration (as a >> comment) and drop a couple of lines explaining it here: >> >> https://github.com/dbpedia/extraction-framework/wiki/Extraction-Instructions >> >> And of course, you can also add everything that you had to figure out on >> your own:) >> >> Thanks >> Dimitris >> >> >> On Tue, Mar 5, 2013 at 5:36 PM, Riko Adi Prasetya >> <[email protected]> >> wrote: >> >> Hi Dimitris and Jona, >> >> Thanks for your reply. >> I found the problem. I forgot to configure the proxy in >> extraction-framework/pom.xml. >> >> Regards, >> Riko >> >> ________________________________ >> Dari: Dimitris Kontokostas <[email protected]> >> Kepada: riko adi prasetya <[email protected]> >> Cc: "[email protected]" >> <[email protected]> >> Dikirim: Senin, 4 Maret 2013 22:06 >> Judul: Re: [Dbpedia-discussion] Abstract extraction problem >> >> Hi Riko, >> >> I updated the settings in the repository (although I don't think this is >> it) >> but can you pull and retry? >> If the problem persists, can you try to debug it and see where exactly in >> the retrievePage() function is the problem? >> e.g. test the generated url and see what you get >> >> Best, >> Dimitris >> >> >> On Mon, Mar 4, 2013 at 2:54 PM, riko adi prasetya >> <[email protected]> >> wrote: >> >> >> Hi all, >> >> I have a problem when I am trying to use AbstractExtractor. I have done >> the >> instructions from [1] to make local mediawiki instance. Then, I tested >> local >> mediawiki instance like this : >> >> >> http://localhost/mw-modified/api.php?uselang=id&action=parse&text=[[This]]%20is%20a%20[[Test_text|text%20for%20testing]] >> >> the result : >> >> <?xml version="1.0"?> >> <api> >> <parse> >> <text xml:space="preserve">This is a text for testing</text> >> </parse> >> </api> >> >> I think there are no problem with my local mediawiki instance, but some >> problem appear when I am running AbstractExtractor. >> >> I replaced this line >> >> private val apiUrl = "http://localhost/mediawiki/api.php" >> >> with >> >> private val apiUrl = "http://localhost/mw-modified/api.php" >> >> >> >> Example the error: >> >> Mar 04, 2013 6:24:24 PM >> org.dbpedia.extraction.dump.extract.ExtractionJob$$anonfun$1 apply >> WARNING: error processing page 'title=Daftar negara bagian di >> Jerman;ns=0/Main/;language:wiki=id,locale=in' >> java.lang.Exception: Could not retrieve abstract for page: title=Daftar >> negara bagian di Jerman;ns=0/Main/;language:wiki=id,locale=in >> at >> >> org.dbpedia.extraction.mappings.AbstractExtractor.retrievePage(AbstractExtractor.scala:134) >> at >> >> org.dbpedia.extraction.mappings.AbstractExtractor.extract(AbstractExtractor.scala:66) >> at >> >> org.dbpedia.extraction.mappings.AbstractExtractor.extract(AbstractExtractor.scala:21) >> at >> >> org.dbpedia.extraction.mappings.CompositeMapping$$anonfun$extract$1.apply(CompositeMapping.scala:13) >> at >> >> org.dbpedia.extraction.mappings.CompositeMapping$$anonfun$extract$1.apply(CompositeMapping.scala:13) >> at >> >> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:239) >> at >> >> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:239) >> at >> >> scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:59) >> at scala.collection.immutable.List.foreach(List.scala:76) >> at >> scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:239) >> at scala.collection.immutable.List.flatMap(List.scala:76) >> at >> >> org.dbpedia.extraction.mappings.CompositeMapping.extract(CompositeMapping.scala:13) >> at >> >> org.dbpedia.extraction.mappings.RootExtractor.apply(RootExtractor.scala:23) >> at >> >> org.dbpedia.extraction.dump.extract.ExtractionJob$$anonfun$1.apply(ExtractionJob.scala:29) >> at >> >> org.dbpedia.extraction.dump.extract.ExtractionJob$$anonfun$1.apply(ExtractionJob.scala:25) >> at >> >> org.dbpedia.extraction.util.SimpleWorkers$$anonfun$apply$1$$anon$2.process(Workers.scala:23) >> at >> >> org.dbpedia.extraction.util.Workers$$anonfun$1$$anon$1.run(Workers.scala:131) >> >> >> >> >> >> >> Any idea how to fix this problem ? Thank you ! >> >> >> [1] https://github.com/dbpedia/dbpedia/tree/master/abstractExtraction >> >> ________________________________ >> Riko Adi Prasetya >> Faculty of Computer Science >> Universitas Indonesia >> >> >> ------------------------------------------------------------------------------ >> Everyone hates slow websites. So do we. >> Make your web apps faster with AppDynamics >> Download AppDynamics Lite for free today: >> http://p.sf.net/sfu/appdyn_d2d_feb >> _______________________________________________ >> Dbpedia-discussion mailing list >> [email protected] >> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion >> >> >> >> >> -- >> Kontokostas Dimitris >> >> >> >> >> >> -- >> Kontokostas Dimitris >> >> > > ------------------------------------------------------------------------------ Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the endpoint security space. For insight on selecting the right partner to tackle endpoint security challenges, access the full report. http://p.sf.net/sfu/symantec-dev2dev _______________________________________________ Dbpedia-discussion mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
