Could you produce a thread dump while the code is in the infinite loop? On
UNIX, kill -QUIT <pid> should work. On Windows, hit Ctrl-Break in the
command window.
I would guess that it's stuck in the loop that ends with
while (gapfrom != null)
in WikiDownloader.addPages [1].
We download the mappings from the MediaWiki API in batches of 50. With each
batch, MediaWiki also sends the name of the page that starts the next batch
in attribute 'gapfrom', but sometimes that doesn't work and we keep
downloading the same batch again and again. I think I once ran into this
problem as well, but I thought I had fixed it. Don't remember exactly what
was the problem. Are you sure you are using the latest version of the code?
Regards,
JC
[1]
http://github.com/dbpedia/extraction-framework/blob/master/core/src/main/scala/org/dbpedia/extraction/util/WikiDownloader.scala
On Aug 1, 2014 8:38 AM, "Dimitris Kontokostas" <[email protected]> wrote:
> Hi Omri,
>
> I tried this again but my execution was successful all times
>
> can you try again and then once more by removing ".par" in [1]?
> if the second option only works I guess you can make a pull request to
> remove this
> we cal this every once in a while so taking a little longer is acceptable
>
> (Or if you have time to look at that code block and try to fix the issue)
>
> Best,
> Dimtiris
> [1]
> https://github.com/jimkont/extraction-framework/blob/master/core/src/main/scala/org/dbpedia/extraction/wikiparser/impl/wikipedia/GenerateWikiSettings.scala#L78-78
>
>
> On Wed, Jun 18, 2014 at 6:15 PM, Omri Oren <[email protected]> wrote:
>
>> Hi everyone,
>>
>> I know I wrote about this in the past, but I'm still getting these
>> infinitely growing mapping xml files whenever I run download-mappings.
>>
>> *Does it finish running for any of you?*
>> Maybe I'm doing something wrong?
>>
>> I suspect there are some circular links in the mappings of the
>> problematic languages (i.e. pl, ko, ja, bg, ar, sr and el)
>>
>> Possible bypass: If not by identifying the infinite loop within the code
>> (or fixing the mappings themselves), maybe there's a way to stop it after
>> it reaches X entries? Otherwise the script never ends, and if I press
>> Ctrl-C the XMLs are left corrupted.
>>
>> Thanks,
>> Omri
>>
>>
>>
>> <http://everything.me/>
>> Omri Oren
>> Algorithm Engineer, EverythingMe <http://everything.me/>
>> [email protected] <[email protected]>
>>
>>
>> On 31 March 2014 14:01, Omri Oren <[email protected]> wrote:
>>
>>> Hello all,
>>>
>>> I've been trying to download the DBpedia mappings today (with a fresh
>>> clone of the DBpedia extraction repo) and the process seemed to never
>>> finish.
>>> I found out that some of the languages' mapping xmls because huge (e.g.
>>> Mapping_ja.xml was ~100MB) and kept on growing until I killed the process.
>>> (see dump below)
>>> It seems some mappings just keep repeating inside the xml.
>>> The problematic languages are pl, ko, ja, bg, ar, sr and el.
>>>
>>> Has anyone else seen this?
>>> Any clue as to how to solve it or bypass it? (it's either a bug in the
>>> mappings api, or in the reader, I guess)
>>>
>>> Thanks,
>>> Omri
>>>
>>>
>>>
>>>
>>>
>>> $ ../run download-mappings
>>> [INFO] Scanning for projects...
>>> [INFO]
>>>
>>> [INFO]
>>> ------------------------------------------------------------------------
>>> [INFO] Building DBpedia Core Libraries 4.0-SNAPSHOT
>>> [INFO]
>>> ------------------------------------------------------------------------
>>> [INFO]
>>> [INFO] >>> scala-maven-plugin:3.1.6:run (default-cli) @ core >>>
>>> [INFO]
>>> [INFO] --- maven-enforcer-plugin:1.3.1:enforce (default) @ core ---
>>> [INFO]
>>> [INFO] --- maven-resources-plugin:2.3:resources (default-resources) @
>>> core ---
>>> [INFO] Using 'UTF-8' encoding to copy filtered resources.
>>> [INFO] skip non existing resourceDirectory
>>> dbpedia_git/core/src/main/resources
>>> [INFO]
>>> [INFO] --- scala-maven-plugin:3.1.6:compile (process-resources) @ core
>>> ---
>>> [WARNING] Expected all dependencies to require Scala version: 2.10.3
>>> [WARNING] org.dbpedia.extraction:core:4.0-SNAPSHOT requires scala
>>> version: 2.10.3
>>> [WARNING] org.scalatest:scalatest_2.10:1.9.1 requires scala version:
>>> 2.10.3
>>> [WARNING] org.scala-lang:scala-actors:2.10.3 requires scala version:
>>> 2.10.3
>>> [WARNING] net.liftweb:lift-json_2.10:2.5.1 requires scala version:
>>> 2.10.3
>>> [WARNING] net.liftweb:lift-json_2.10:2.5.1 requires scala version:
>>> 2.10.0
>>> [WARNING] Multiple versions of scala libraries detected!
>>> [INFO] Nothing to compile - all classes are up to date
>>> [INFO]
>>> [INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ core ---
>>> [INFO] Nothing to compile - all classes are up to date
>>> [INFO]
>>> [INFO] --- scala-maven-plugin:3.1.6:compile (compile) @ core ---
>>> [WARNING] Expected all dependencies to require Scala version: 2.10.3
>>> [WARNING] org.dbpedia.extraction:core:4.0-SNAPSHOT requires scala
>>> version: 2.10.3
>>> [WARNING] org.scalatest:scalatest_2.10:1.9.1 requires scala version:
>>> 2.10.3
>>> [WARNING] org.scala-lang:scala-actors:2.10.3 requires scala version:
>>> 2.10.3
>>> [WARNING] net.liftweb:lift-json_2.10:2.5.1 requires scala version:
>>> 2.10.3
>>> [WARNING] net.liftweb:lift-json_2.10:2.5.1 requires scala version:
>>> 2.10.0
>>> [WARNING] Multiple versions of scala libraries detected!
>>> [INFO] Nothing to compile - all classes are up to date
>>> [INFO]
>>> [INFO] --- maven-resources-plugin:2.3:testResources
>>> (default-testResources) @ core ---
>>> [INFO] Using 'UTF-8' encoding to copy filtered resources.
>>> [INFO] Copying 25 resources
>>> [INFO]
>>> [INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @
>>> core ---
>>> [INFO] Nothing to compile - all classes are up to date
>>> [INFO]
>>> [INFO] --- scala-maven-plugin:3.1.6:testCompile (test-compile) @ core ---
>>> [WARNING] Expected all dependencies to require Scala version: 2.10.3
>>> [WARNING] org.dbpedia.extraction:core:4.0-SNAPSHOT requires scala
>>> version: 2.10.3
>>> [WARNING] org.scalatest:scalatest_2.10:1.9.1 requires scala version:
>>> 2.10.3
>>> [WARNING] org.scala-lang:scala-actors:2.10.3 requires scala version:
>>> 2.10.3
>>> [WARNING] net.liftweb:lift-json_2.10:2.5.1 requires scala version:
>>> 2.10.3
>>> [WARNING] net.liftweb:lift-json_2.10:2.5.1 requires scala version:
>>> 2.10.0
>>> [WARNING] Multiple versions of scala libraries detected!
>>> [INFO] Nothing to compile - all classes are up to date
>>> [INFO]
>>> [INFO] <<< scala-maven-plugin:3.1.6:run (default-cli) @ core <<<
>>> [INFO]
>>> [INFO] --- scala-maven-plugin:3.1.6:run (default-cli) @ core ---
>>> [WARNING] Expected all dependencies to require Scala version: 2.10.3
>>> [WARNING] org.dbpedia.extraction:core:4.0-SNAPSHOT requires scala
>>> version: 2.10.3
>>> [WARNING] org.scalatest:scalatest_2.10:1.9.1 requires scala version:
>>> 2.10.3
>>> [WARNING] org.scala-lang:scala-actors:2.10.3 requires scala version:
>>> 2.10.3
>>> [WARNING] net.liftweb:lift-json_2.10:2.5.1 requires scala version:
>>> 2.10.3
>>> [WARNING] net.liftweb:lift-json_2.10:2.5.1 requires scala version:
>>> 2.10.0
>>> [WARNING] Multiple versions of scala libraries detected!
>>> [INFO] launcher 'download-mappings' selected =>
>>> org.dbpedia.extraction.util.MappingsDownloader
>>> downloading mappings from http://mappings.dbpedia.org/api.php to
>>> ../mappings/Mapping_ru.xml
>>> downloading mappings from http://mappings.dbpedia.org/api.php to
>>> ../mappings/Mapping_de.xml
>>> downloading mappings from http://mappings.dbpedia.org/api.php to
>>> ../mappings/Mapping_el.xml
>>> downloading mappings from http://mappings.dbpedia.org/api.php to
>>> ../mappings/Mapping_et.xml
>>> downloading mappings from http://mappings.dbpedia.org/api.php to
>>> ../mappings/Mapping_es.xml
>>> downloading mappings from http://mappings.dbpedia.org/api.php to
>>> ../mappings/Mapping_nl.xml
>>> downloading mappings from http://mappings.dbpedia.org/api.php to
>>> ../mappings/Mapping_hi.xml
>>> downloading mappings from http://mappings.dbpedia.org/api.php to
>>> ../mappings/Mapping_eo.xml
>>> downloading mappings from http://mappings.dbpedia.org/api.php to
>>> ../mappings/Mapping_cy.xml
>>> downloading mappings from http://mappings.dbpedia.org/api.php to
>>> ../mappings/Mapping_zh.xml
>>> downloading mappings from http://mappings.dbpedia.org/api.php to
>>> ../mappings/Mapping_cs.xml
>>> downloading mappings from http://mappings.dbpedia.org/api.php to
>>> ../mappings/Mapping_ur.xml
>>> downloading mappings from http://mappings.dbpedia.org/api.php to
>>> ../mappings/Mapping_ga.xml
>>> downloading mappings from http://mappings.dbpedia.org/api.php to
>>> ../mappings/Mapping_bg.xml
>>> downloading mappings from http://mappings.dbpedia.org/api.php to
>>> ../mappings/Mapping_ar.xml
>>> downloading mappings from http://mappings.dbpedia.org/api.php to
>>> ../mappings/Mapping_en.xml
>>> downloaded mappings from http://mappings.dbpedia.org/api.php to
>>> ../mappings/Mapping_zh.xml in 1.9705708 seconds
>>> downloading mappings from http://mappings.dbpedia.org/api.php to
>>> ../mappings/Mapping_sk.xml
>>> downloaded mappings from http://mappings.dbpedia.org/api.php to
>>> ../mappings/Mapping_ur.xml in 2.207869 seconds
>>> downloading mappings from http://mappings.dbpedia.org/api.php to
>>> ../mappings/Mapping_bn.xml
>>> downloaded mappings from http://mappings.dbpedia.org/api.php to
>>> ../mappings/Mapping_et.xml in 2.258747 seconds
>>> downloading mappings from http://mappings.dbpedia.org/api.php to
>>> ../mappings/Mapping_sr.xml
>>> downloaded mappings from http://mappings.dbpedia.org/api.php to
>>> ../mappings/Mapping_eo.xml in 2.9767556 seconds
>>> downloading mappings from http://mappings.dbpedia.org/api.php to
>>> ../mappings/Mapping_eu.xml
>>> downloaded mappings from http://mappings.dbpedia.org/api.php to
>>> ../mappings/Mapping_ru.xml in 4.1540837 seconds
>>> downloading mappings from http://mappings.dbpedia.org/api.php to
>>> ../mappings/Mapping_id.xml
>>> downloaded mappings from http://mappings.dbpedia.org/api.php to
>>> ../mappings/Mapping_sk.xml in 2.4665806 seconds
>>> downloading mappings from http://mappings.dbpedia.org/api.php to
>>> ../mappings/Mapping_sl.xml
>>> downloaded mappings from http://mappings.dbpedia.org/api.php to
>>> ../mappings/Mapping_bn.xml in 2.6779795 seconds
>>> downloading mappings from http://mappings.dbpedia.org/api.php to
>>> ../mappings/Mapping_it.xml
>>> downloaded mappings from http://mappings.dbpedia.org/api.php to
>>> ../mappings/Mapping_ga.xml in 5.731542 seconds
>>> downloaded mappings from http://mappings.dbpedia.org/api.php to
>>> ../mappings/Mapping_hi.xml in 5.953644 seconds
>>> downloading mappings from http://mappings.dbpedia.org/api.php to
>>> ../mappings/Mapping_hr.xml
>>> downloaded mappings from http://mappings.dbpedia.org/api.php to
>>> ../mappings/Mapping_cy.xml in 6.2881546 seconds
>>> downloaded mappings from http://mappings.dbpedia.org/api.php to
>>> ../mappings/Mapping_eu.xml in 3.4312966 seconds
>>> downloading mappings from http://mappings.dbpedia.org/api.php to
>>> ../mappings/Mapping_ca.xml
>>> downloaded mappings from http://mappings.dbpedia.org/api.php to
>>> ../mappings/Mapping_hr.xml in 2.6589563 seconds
>>> downloading mappings from http://mappings.dbpedia.org/api.php to
>>> ../mappings/Mapping_ja.xml
>>> downloaded mappings from http://mappings.dbpedia.org/api.php to
>>> ../mappings/Mapping_es.xml in 9.406335 seconds
>>> downloading mappings from http://mappings.dbpedia.org/api.php to
>>> ../mappings/Mapping_pt.xml
>>> downloaded mappings from http://mappings.dbpedia.org/api.php to
>>> ../mappings/Mapping_cs.xml in 9.530285 seconds
>>> downloaded mappings from http://mappings.dbpedia.org/api.php to
>>> ../mappings/Mapping_id.xml in 5.628132 seconds
>>> downloading mappings from http://mappings.dbpedia.org/api.php to
>>> ../mappings/Mapping_pl.xml
>>> downloaded mappings from http://mappings.dbpedia.org/api.php to
>>> ../mappings/Mapping_it.xml in 5.09654 seconds
>>> downloaded mappings from http://mappings.dbpedia.org/api.php to
>>> ../mappings/Mapping_ca.xml in 4.712224 seconds
>>> downloading mappings from http://mappings.dbpedia.org/api.php to
>>> ../mappings/Mapping_ko.xml
>>> downloaded mappings from http://mappings.dbpedia.org/api.php to
>>> ../mappings/Mapping_sl.xml in 12.70928 seconds
>>> downloading mappings from http://mappings.dbpedia.org/api.php to
>>> ../mappings/Mapping_tr.xml
>>> downloaded mappings from http://mappings.dbpedia.org/api.php to
>>> ../mappings/Mapping_de.xml in 17.712517 seconds
>>> downloaded mappings from http://mappings.dbpedia.org/api.php to
>>> ../mappings/Mapping_pt.xml in 12.278304 seconds
>>> downloading mappings from http://mappings.dbpedia.org/api.php to
>>> ../mappings/Mapping_hu.xml
>>> downloaded mappings from http://mappings.dbpedia.org/api.php to
>>> ../mappings/Mapping_hu.xml in 5.856224 seconds
>>> downloaded mappings from http://mappings.dbpedia.org/api.php to
>>> ../mappings/Mapping_tr.xml in 10.662249 seconds
>>> downloaded mappings from http://mappings.dbpedia.org/api.php to
>>> ../mappings/Mapping_nl.xml in 28.139694 seconds
>>> downloaded mappings from http://mappings.dbpedia.org/api.php to
>>> ../mappings/Mapping_en.xml in 39.06333 seconds
>>> *<here some of the files keep growing infinitely and the process doesn't
>>> stop>*
>>> ^Z
>>>
>>>
>>>
>>> <http://everything.me/>
>>> Omri Oren
>>> Algorithm Engineer, EverythingMe <http://everything.me/>
>>> [email protected] <[email protected]>
>>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
>> Find What Matters Most in Your Big Data with HPCC Systems
>> Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
>> Leverages Graph Analysis for Fast Processing & Easy Data Exploration
>> http://p.sf.net/sfu/hpccsystems
>> _______________________________________________
>> Dbpedia-developers mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/dbpedia-developers
>>
>>
>
>
> --
> Kontokostas Dimitris
>
>
> ------------------------------------------------------------------------------
> Want fast and easy access to all the code in your enterprise? Index and
> search up to 200,000 lines of code with a free copy of Black Duck
> Code Sight - the same software that powers the world's largest code
> search on Ohloh, the Black Duck Open Hub! Try it now.
> http://p.sf.net/sfu/bds
> _______________________________________________
> Dbpedia-developers mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dbpedia-developers
>
>
------------------------------------------------------------------------------
Want fast and easy access to all the code in your enterprise? Index and
search up to 200,000 lines of code with a free copy of Black Duck
Code Sight - the same software that powers the world's largest code
search on Ohloh, the Black Duck Open Hub! Try it now.
http://p.sf.net/sfu/bds
_______________________________________________
Dbpedia-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-developers