Hi Roberto,

because it is tedious to download each Wikipedia dump and organize it
in the filesystem manually, we just introduced the configuration
parameter 'updateDumps'. If you set it to true, the extraction
framework will automatically check the dump directory for existing
dumps and download them if they are either missing or not up-to-date.

Could you please update from the svn and re-run the framework after
adding the line 'updateDumps=true' to your config.properties?
I'll update the online documentation later today.

Cheers
Robert

On Tue, Mar 23, 2010 at 12:41 PM, Roberto Nieto <[email protected]> wrote:
> Hi Robert,
>
> Thanks for your attention..but my problem persist..
> I will try to explain all my configuration, because I should be doing
> something strange:
>
> config.properties
> dumpDir=/home/rober/Escritorio/dbpedia/datos/pages
> outputDir=/home/rober/Escritorio/dbpedia/output
> extractors=org.dbpedia.extraction.mappings.LabelExtractor \
>            org.dbpedia.extraction.mappings.WikiPageExtractor \
>            org.dbpedia.extraction.mappings.InfoboxExtractor \
>            org.dbpedia.extraction.mappings.PageLinksExtractor \
>            org.dbpedia.extraction.mappings.GeoExtractor
>
> extractors.en=org.dbpedia.extraction.mappings.CategoryLabelExtractor \
>               org.dbpedia.extraction.mappings.ArticleCategoriesExtractor \
>               org.dbpedia.extraction.mappings.ImageExtractor \
>               org.dbpedia.extraction.mappings.ExternalLinksExtractor \
>               org.dbpedia.extraction.mappings.HomepageExtractor \
>               org.dbpedia.extraction.mappings.DisambiguationExtractor \
>               org.dbpedia.extraction.mappings.PersondataExtractor \
>               org.dbpedia.extraction.mappings.PndExtractor \
>               org.dbpedia.extraction.mappings.SkosCategoriesExtractor \
>               org.dbpedia.extraction.mappings.RedirectExtractor \
>               org.dbpedia.extraction.mappings.MappingExtractor
>
> languages=es
>
> Following what I undestand, my dumps are in this paths
>
> COMMONS:
> /home/rober/Escritorio/dbpedia/datos/pages/20100311/commons/commonswiki-20100319-pages-articles.xml.bz2
> SPANISH:
> /home/rober/Escritorio/dbpedia/datos/pages/20100319/eswiki/eswiki-20100311-pages-articles.xml.bz2
>
> But this error appeared:
>
> [INFO] Checking for multiple versions of scala
> [INFO] launcher 'Extract' selected => org.dbpedia.extraction.Extract
> Exception in thread "Thread-1" java.lang.Exception: Dump directory not
> found: /home/rober/Escritorio/dbpedia/datos/pages/commons
>     at
> org.dbpedia.extraction.ConfigLoader$Config.getDumpFile(ConfigLoader.scala:93)
>     at
> org.dbpedia.extraction.ConfigLoader$Config.<init>(ConfigLoader.scala:85)
>     at org.dbpedia.extraction.ConfigLoader$.load(ConfigLoader.scala:28)
>     at org.dbpedia.extraction.Extract$ExtractionThread.run(Extract.scala:26)
> [INFO]
> ------------------------------------------------------------------------
> [INFO] BUILD SUCCESSFUL
> [INFO]
> ------------------------------------------------------------------------
>
>
> I also tried it with:
>
> /home/rober/Escritorio/dbpedia/datos/pages/commons/20100311/commonswiki-20100319-pages-articles.xml.bz2
> /home/rober/Escritorio/dbpedia/datos/pages/eswiki/20100319/eswiki-20100311-pages-articles.xml.bz2
>
> and:
>
> /home/rober/Escritorio/dbpedia/datos/pages/commons/commonswiki-20100319-pages-articles.xml.bz2
> /home/rober/Escritorio/dbpedia/datos/pages/eswiki/eswiki-20100311-pages-articles.xml.bz2
>
> But in this case:
>
> [INFO] launcher 'Extract' selected => org.dbpedia.extraction.Extract
> Exception in thread "Thread-1" java.lang.Exception: Dump not found:
> /home/rober/Escritorio/dbpedia/datos/pages/commons/20100319/commonswiki-20100319-pages-articles.xml
>     at
> org.dbpedia.extraction.ConfigLoader$Config.getDumpFile(ConfigLoader.scala:102)
>     at
> org.dbpedia.extraction.ConfigLoader$Config.<init>(ConfigLoader.scala:85)
>     at org.dbpedia.extraction.ConfigLoader$.load(ConfigLoader.scala:28)
>     at org.dbpedia.extraction.Extract$ExtractionThread.run(Extract.scala:26)
> [INFO]
> ------------------------------------------------------------------------
> [INFO] BUILD SUCCESSFUL
> [INFO]
> ------------------------------------------------------------------------
>
>
>
> Any idea? It should be a stupid error
>
> Thanks for everything
>
>
> 2010/3/23 Robert Isele <[email protected]>
>>
>> Hi Roberto,
>>
>> you can get the latest Wikipedia Commons dump at
>>
>> http://download.wikimedia.org/commonswiki/20100319/commonswiki-20100319-pages-articles.xml.bz2.
>> The file is expected to be found in the directory
>> {dumpDir}/20100319/commons/commonswiki-20100319-pages-articles.xml.bz2.
>>
>> Cheers
>> Robert
>>
>> On Tue, Mar 23, 2010 at 10:27 AM, Roberto Nieto <[email protected]>
>> wrote:
>> > Hi everyone,
>> >
>> > I'm trying to use the Information Extraction Framework, but i should be
>> > doing something wrong and I'm having problems with the dumps.
>> >
>> > I download the dump "eswikisource-20100317-pages-articles.xml.bz2" I
>> > saved
>> > it in a folder, I setup the configuration dumpDir to the folder and I
>> > try to
>> > run the extraction..but...
>> >
>> > [INFO] launcher 'Extract' selected => org.dbpedia.extraction.Extract
>> > Exception in thread "Thread-1" java.lang.Exception: Dump directory not
>> > found: /home/rober/Escritorio/dbpedia/datos/pages/commons
>> >     at
>> >
>> > org.dbpedia.extraction.ConfigLoader$Config.getDumpFile(ConfigLoader.scala:93)
>> >     at
>> > org.dbpedia.extraction.ConfigLoader$Config.<init>(ConfigLoader.scala:85)
>> >     at org.dbpedia.extraction.ConfigLoader$.load(ConfigLoader.scala:28)
>> >     at
>> > org.dbpedia.extraction.Extract$ExtractionThread.run(Extract.scala:26)
>> > [INFO]
>> > ------------------------------------------------------------------------
>> > [INFO] BUILD SUCCESSFUL
>> >
>> >
>> > Reading the doc I saw this "The dump files should be organized in the
>> > way as
>> > they are on the wikipedia servers.
>> > e.g. {dumpDir}/sc/20100306/scwiki-20100306-pages-articles.xml.bz2. In
>> > addition to the dumps of the configured languages, you'll need the
>> > Wikipedia
>> > Commons Dump."
>> >
>> > Now I'm not sure what is "the Wikipedia Commons Dump"... or if I'm using
>> > a
>> > wrong dump
>> >
>> > Can anyone help me?
>> >
>> > Thanks for the attention.
>> >
>> >
>> > ------------------------------------------------------------------------------
>> > Download Intel&#174; Parallel Studio Eval
>> > Try the new software tools for yourself. Speed compiling, find bugs
>> > proactively, and fine-tune applications for parallel performance.
>> > See why Intel Parallel Studio got high marks during beta.
>> > http://p.sf.net/sfu/intel-sw-dev
>> > _______________________________________________
>> > Dbpedia-discussion mailing list
>> > [email protected]
>> > https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>> >
>> >
>
>

------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to