Re: [Dbpedia-discussion] Abstract extraction problem

Jona Christopher Sahnwaldt Tue, 12 Mar 2013 09:30:37 -0700

Thanks. As I expected, the problem is that the XML parser tries to
download a schema or DTD from www.w3.org, probably to validate the XML
returned by the local MediaWiki.


I'd like to look into it, but I don't know if I'll have time, so any
help is welcome. This discussion may help:

http://stackoverflow.com/questions/6539051/how-can-i-tell-xalan-not-to-validate-xml-retreived-using-the-document-function

Cheers,
JC

On Mon, Mar 11, 2013 at 5:37 AM, Riko Adi Prasetya
<[email protected]> wrote:
> Hi Jona,
>
> Sorry for the late reply.
> I have attached error log.
>
> Regards,
> Riko
>
> ________________________________
> Dari: Jona Christopher Sahnwaldt <[email protected]>
>
> Kepada: Riko Adi Prasetya <[email protected]>
> Cc: Dimitris Kontokostas <[email protected]>;
> "[email protected]"
> <[email protected]>; Jose Emilio Labra Gayo
> <[email protected]>
> Dikirim: Kamis, 7 Maret 2013 18:01
>
> Judul: Re: [Dbpedia-discussion] Abstract extraction problem
>
> Hi Riko,
>
>> - java.net.UnknownHostException: www.w3.org
> This is weird. Could you please send us the whole stack trace? I don't
> think the extraction framework should try to access anything but
> localhost. Could be some kind of XML schema thing. If it is, we should
> probably turn it off.
>
> I still don't quite understand why you have to tell your JVM not to
> use a proxy for localhost. I guess the JVM picks up the proxy
> configuration from the operating system. Maybe you should configure
> the OS such that no proxy is used for localhost.
>
> Cheers,
> JC
>
> On Thu, Mar 7, 2013 at 11:36 AM, Riko Adi Prasetya
> <[email protected]> wrote:
>> Hi Dimitris,
>>
>> I use my campus' internet connection that must use proxy. So, i must
>> configure it in extraction-framework/dump/pom.xml.
>> I configure it like this,
>>                                                <launcher>
>>                            <id>extraction</id>
>>
>> <mainClass>org.dbpedia.extraction.dump.extract.Extraction</mainClass>
>>                            <jvmArgs>
>>                                <jvmArg>-server</jvmArg>
>>                                <jvmArg>-Xmx1024m</jvmArg>
>>
>> <jvmArg>-Dhttp.proxyHost=152.118.24.10</jvmArg>
>>                                <jvmArg>-Dhttp.proxyPort=8080</jvmArg>
>>
>> <jvmArg>-Dhttp.nonProxyHosts="localhost|152.118.*.*|*.ui.ac.id"</jvmArg>
>>                            </jvmArgs>
>>                        </launcher>
>>
>> Before I solved this problem, I found some kind of message error :
>> - java.net.ConnectException: Connection timed out
>> - java.net.SocketException: Invalid argument or cannot assign requested
>> address
>> - java.net.UnknownHostException: www.w3.org
>> - java.lang.Exception: Could not retrieve abstract for page: title=Daftar
>> filsuf;ns=0/Main/;language:wiki=id,locale=in
>>
>> I have sent pull request.
>>
>> Thank you Dimitris and Jona
>>
>> Regards,
>> Riko
>>
>> ________________________________
>> Dari: Dimitris Kontokostas <[email protected]>
>> Kepada: Riko Adi Prasetya <[email protected]>
>> Cc: Jona Sahnwaldt <[email protected]>;
>> "[email protected]"
>> <[email protected]>; Jose Emilio Labra Gayo
>> <[email protected]>
>> Dikirim: Selasa, 5 Maret 2013 23:09
>>
>> Judul: Re: [Dbpedia-discussion] Abstract extraction problem
>>
>> Hi Riko,
>>
>> We had similar (proxy) problems in the past but we didn't documented them
>> anywhere.Would you mind writing how you bypassed the proxy issue?
>>
>> You could make a pull request with your proxy-pom configuration (as a
>> comment) and drop a couple of lines explaining it here:
>>
>> https://github.com/dbpedia/extraction-framework/wiki/Extraction-Instructions
>>
>> And of course, you can also add everything that you had to figure out on
>> your own:)
>>
>> Thanks
>> Dimitris
>>
>>
>> On Tue, Mar 5, 2013 at 5:36 PM, Riko Adi Prasetya
>> <[email protected]>
>> wrote:
>>
>> Hi Dimitris and Jona,
>>
>> Thanks for your reply.
>> I found the problem. I forgot to configure the proxy in
>> extraction-framework/pom.xml.
>>
>> Regards,
>> Riko
>>
>> ________________________________
>> Dari: Dimitris Kontokostas <[email protected]>
>> Kepada: riko adi prasetya <[email protected]>
>> Cc: "[email protected]"
>> <[email protected]>
>> Dikirim: Senin, 4 Maret 2013 22:06
>> Judul: Re: [Dbpedia-discussion] Abstract extraction problem
>>
>> Hi Riko,
>>
>> I updated the settings in the repository (although I don't think this is
>> it)
>> but can you pull and retry?
>> If the problem persists, can you try to debug it and see where exactly in
>> the retrievePage() function is the problem?
>> e.g. test the generated url and see what you get
>>
>> Best,
>> Dimitris
>>
>>
>> On Mon, Mar 4, 2013 at 2:54 PM, riko adi prasetya
>> <[email protected]>
>> wrote:
>>
>>
>>  Hi all,
>>
>> I have a problem when I am trying to use AbstractExtractor. I have done
>> the
>> instructions from [1] to make local mediawiki instance. Then, I tested
>> local
>> mediawiki instance like this :
>>
>>
>> http://localhost/mw-modified/api.php?uselang=id&action=parse&text=[[This]]%20is%20a%20[[Test_text|text%20for%20testing]]
>>
>> the result :
>>
>> <?xml version="1.0"?>
>> <api>
>>  <parse>
>>    <text xml:space="preserve">This is a text for testing</text>
>>  </parse>
>> </api>
>>
>> I think there are no problem with my local mediawiki instance, but some
>> problem appear when I am running AbstractExtractor.
>>
>> I replaced this line
>>
>> private val apiUrl = "http://localhost/mediawiki/api.php";
>>
>> with
>>
>> private val apiUrl = "http://localhost/mw-modified/api.php";
>>
>>
>>
>> Example the error:
>>
>> Mar 04, 2013 6:24:24 PM
>> org.dbpedia.extraction.dump.extract.ExtractionJob$$anonfun$1 apply
>> WARNING: error processing page 'title=Daftar negara bagian di
>> Jerman;ns=0/Main/;language:wiki=id,locale=in'
>> java.lang.Exception: Could not retrieve abstract for page: title=Daftar
>> negara bagian di Jerman;ns=0/Main/;language:wiki=id,locale=in
>>     at
>>
>> org.dbpedia.extraction.mappings.AbstractExtractor.retrievePage(AbstractExtractor.scala:134)
>>     at
>>
>> org.dbpedia.extraction.mappings.AbstractExtractor.extract(AbstractExtractor.scala:66)
>>     at
>>
>> org.dbpedia.extraction.mappings.AbstractExtractor.extract(AbstractExtractor.scala:21)
>>     at
>>
>> org.dbpedia.extraction.mappings.CompositeMapping$$anonfun$extract$1.apply(CompositeMapping.scala:13)
>>     at
>>
>> org.dbpedia.extraction.mappings.CompositeMapping$$anonfun$extract$1.apply(CompositeMapping.scala:13)
>>     at
>>
>> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:239)
>>     at
>>
>> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:239)
>>     at
>>
>> scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:59)
>>     at scala.collection.immutable.List.foreach(List.scala:76)
>>     at
>> scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:239)
>>     at scala.collection.immutable.List.flatMap(List.scala:76)
>>     at
>>
>> org.dbpedia.extraction.mappings.CompositeMapping.extract(CompositeMapping.scala:13)
>>     at
>>
>> org.dbpedia.extraction.mappings.RootExtractor.apply(RootExtractor.scala:23)
>>     at
>>
>> org.dbpedia.extraction.dump.extract.ExtractionJob$$anonfun$1.apply(ExtractionJob.scala:29)
>>     at
>>
>> org.dbpedia.extraction.dump.extract.ExtractionJob$$anonfun$1.apply(ExtractionJob.scala:25)
>>     at
>>
>> org.dbpedia.extraction.util.SimpleWorkers$$anonfun$apply$1$$anon$2.process(Workers.scala:23)
>>     at
>>
>> org.dbpedia.extraction.util.Workers$$anonfun$1$$anon$1.run(Workers.scala:131)
>>
>>
>>
>>
>>
>>
>> Any idea how to fix this problem ? Thank you !
>>
>>
>> [1] https://github.com/dbpedia/dbpedia/tree/master/abstractExtraction
>>
>> ________________________________
>> Riko Adi Prasetya
>> Faculty of Computer Science
>> Universitas Indonesia
>>
>>
>> ------------------------------------------------------------------------------
>> Everyone hates slow websites. So do we.
>> Make your web apps faster with AppDynamics
>> Download AppDynamics Lite for free today:
>> http://p.sf.net/sfu/appdyn_d2d_feb
>> _______________________________________________
>> Dbpedia-discussion mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>>
>>
>>
>>
>> --
>> Kontokostas Dimitris
>>
>>
>>
>>
>>
>> --
>> Kontokostas Dimitris
>>
>>
>
>

------------------------------------------------------------------------------
Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester  
Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the  
endpoint security space. For insight on selecting the right partner to 
tackle endpoint security challenges, access the full report. 
http://p.sf.net/sfu/symantec-dev2dev
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Re: [Dbpedia-discussion] Abstract extraction problem

Reply via email to