Hi Gaurav,
Be patient,
I spent 4 hours for extracting Indonesian data dump.
I think it is depend on host spec and size of data dump.
Yes, extracted triplets in the same source directory.
Cheers,
Riko
________________________________
Dari: gaurav pant <[email protected]>
Kepada: riko adi prasetya <[email protected]>
Dikirim: Selasa, 5 Maret 2013 12:38
Judul: Re: [Dbpedia-discussion] extraction problem
Hi Riko,
Thanks for your reply..i have tried with that change. Its running but from a
long waiting at
"
Mar 05, 2013 5:13:30 AM org.dbpedia.extraction.mappings.Redirects$ loadFromCache
INFO: Loading redirects from cache file
/mnt/ebs/perl/framework/extraction-framework/dump/wiki_dump/dewiki/20130219/dewiki-20130219-template-redirects.obj
Mar 05, 2013 5:13:30 AM org.dbpedia.extraction.mappings.Redirects$ load
INFO: Will extract redirects from source for de wiki, could not load cache file
'/mnt/ebs/perl/framework/extraction-framework/dump/wiki_dump/dewiki/20130219/dewiki-20130219-template-redirects.obj':
java.io.FileNotFoundException:
/mnt/ebs/perl/framework/extraction-framework/dump/wiki_dump/dewiki/20130219/dewiki-20130219-template-redirects.obj
(No such file or directory)
Mar 05, 2013 5:13:30 AM org.dbpedia.extraction.mappings.Redirects$
loadFromSource
INFO: Loading redirects from source (de)
Mar 05, 2013 5:28:58 AM
org.dbpedia.extraction.mappings.Redirects$RedirectFinder apply
WARNING: wrong redirect. page:
[title=Mikrogramm;ns=0/Main/;language:wiki=de,locale=de].
found by dbpedia: [title=Gramm;ns=0/Main/;language:wiki=de,locale=de].
found by wikipedia: [null]
"
Is it because I have downloaded file page-article file manually not using
dbpedia-extraction and due to this other required file could not be downloaded?
Also where it will give extracted triplets...in the same source directory?
On Tue, Mar 5, 2013 at 10:56 AM, riko adi prasetya <[email protected]>
wrote:
Hi Gaurav,
>
>
>Try to check again your extraction.de.property
>
>
>"# download and extraction target dir
>dir=/mnt/ebs/perl/framework/extraction-framework/dump/wiki_dump
>
># Source file. If source file name ends with .gz or .bz2, it is unzipped on
>the fly.
># Must exist in the directory xxwiki/20121231 and have the prefix
>xxwiki-20121231-.
>
># default:
># source=pages-articles.xml
>
># alternatives:
>source=pages-articles.xml.bz2
># source=pages-articles.xml.gz
>
># use only directories that contain a 'download-complete' file? Default is
>false.
>require-download-complete=true
>
># unqualified extractor class names are prefixed by
>org.dbpedia.extraction.mappings.
>
># All 111 languages that as of 2012-05-25 have 10000 articles or more.
># TODO: parse wikipedias.csv and figure out from there which languages to
>extract.
># If no languages are given, the ones having a mapping namespace on
>mappings.dbpedia.org are used
>languages=de
>
>extractors=InfoboxExtractor
>#ArticleCategoriesExtractor,CategoryLabelExtractor,ExternalLinksExtractor,\
>#GeoExtractor,InfoboxExtractor,LabelExtractor,PageIdExtractor,PageLinksExtractor,\
>#RedirectExtractor,RevisionIdExtractor,SkosCategoriesExtractor,WikiPageExtractor
>
>extractors.de=InfoboxExtractor
>#extractors.de=MappingExtractor,DisambiguationExtractor,InterLanguageLinksExtractor,RedirectExtractor,LabelExtractor
>#extractors.en=MappingExtractor,DisambiguationExtractor,InterLanguageLinksExtractor,RedirectExtractor,LabelExtractor
>
># if ontology and mapping files are not given or do not exist, download info
>from mappings.dbpedia.org
>ontology=../ontology.xml
>mappings=../mappings
>
># URI policies. Allowed flags: uri, generic, xml-safe. Each flag may have on
>of the suffixes
># -subjects, -predicates, -objects, -datatype, -context to match only URIs in
>a certain position.
># Without a suffix, a flag matches all URI positions.
>
>uri-policy.uri=uri:en; generic:en; xml-safe-predicates:*
>uri-policy.iri=generic:en; xml-safe-predicates:*
>
>
># File formats. Allowed flags: n-triples, n-quads, turtle-triples,
>turtle-quads, trix-triples, trix-quads
># May be followed by a semicolon and a URI policy name. If format name ends
>with .gz or .bz2, files
># are zipped on the fly.
>
># NT is unreadable anyway - might as well use URIs
>format.nt=n-triples;uri-policy.uri
>#format.nq.gz=n-quads;uri-policy.uri
>
># Turtle is much more readable - use nice IRIs
>format.ttl=turtle-triples;uri-policy.iri
>#format.tql.gz=turtle-quads;uri-policy.iri
>"
>
>
>
>You write dir, so there is not base-dir in your extraction configuration.
>
>Cheers,
>Riko
>
>
>________________________________
>Riko Adi Prasetya
>Faculty of Computer Science
>Universitas Indonesia
>
>
>
>________________________________
> Dari: gaurav pant <[email protected]>
>Kepada: [email protected]
>Dikirim: Selasa, 5 Maret 2013 12:10
>Judul: [Dbpedia-discussion] extraction problem
>
>
>Hi All,
>
>Greeting for the day..
>
>I want to extract infobox properties and abstract from
>(pages-articles.xml.bz2).I am able to download this file using command "../run
>download config=download.de.properties"
>
>here I have configured file download.de.properties.file to download only
>german page-article file.
>
>Now when i am trying to extract information out from it using "../run
>extraction extraction.de.property" it is giving me below error. In
>extraction.de.property I have mentioned dir properly , the same which I have
>mentioned in download.de.properties file.
>
>Please let me know what wrong is going on?Is there any change need to be done
>in pom.xml of cump dir.
>
>"
>[INFO] --- maven-scala-plugin:2.15.2:testCompile (test-compile) @ dump ---
>[INFO] Checking for multiple versions of scala
>[INFO] includes = [**/*.scala,**/*.java,]
>[INFO] excludes = []
>[WARNING] No source files found.
>[INFO]
>[INFO] <<< maven-scala-plugin:2.15.2:run (default-cli) @ dump <<<
>[INFO]
>[INFO] --- maven-scala-plugin:2.15.2:run (default-cli) @ dump ---
>[INFO] Checking for multiple versions of scala
>[INFO] launcher 'extraction' selected =>
>org.dbpedia.extraction.dump.extract.Extraction
>java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
>sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
>sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:601)
> at org_scala_tools_maven_executions.MainHelper.runMain(MainHelper.java:161)
> at
>org_scala_tools_maven_executions.MainWithArgsInFile.main(MainWithArgsInFile.java:26)
>Caused by: java.lang.IllegalArgumentException: property 'base-dir' not defined.
> at
>org.dbpedia.extraction.dump.extract.ConfigParser.error(ConfigParser.scala:18)
> at org.dbpedia.extraction.dump.extract.Config.<init>(Config.scala:26)
> at
>org.dbpedia.extraction.dump.extract.Extraction$.main(Extraction.scala:26)
> at org.dbpedia.extraction.dump.extract.Extraction.main(Extraction.scala)
> ... 6 more
>[INFO] ------------------------------------------------------------------------
>[INFO] BUILD FAILURE
>[INFO] ------------------------------------------------------------------------
>[INFO] Total time: 3.356s
>[INFO] Finished at: Tue Mar 05 04:52:35 UTC 2013
>[INFO] Final Memory: 8M/140M
>[INFO] ------------------------------------------------------------------------
>[ERROR] Failed to execute goal org.scala-tools:maven-scala-plugin:2.15.2:run
>(default-cli) on project dump: wrap: org.apache.commons.exec.ExecuteException:
>Process exited with an error: 240(Exit value: 240) -> [Help 1]
>[ERROR]
>[ERROR] To see the full stack trace of the errors, re-run Maven with the -e
>switch.
>[ERROR] Re-run Maven using the -X switch to enable full debug logging.
>[ERROR]
>[ERROR] For more information about the errors and possible solutions, please
>read the following articles:
>[ERROR] [Help 1]
>http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
>"
>
>contents of extraction.de.property
>
>"# download and extraction target dir
>dir=/mnt/ebs/perl/framework/extraction-framework/dump/wiki_dump
>
># Source file. If source file name ends with .gz or .bz2, it is unzipped on
>the fly.
># Must exist in the directory xxwiki/20121231 and have the prefix
>xxwiki-20121231-.
>
># default:
># source=pages-articles.xml
>
># alternatives:
>source=pages-articles.xml.bz2
># source=pages-articles.xml.gz
>
># use only directories that contain a 'download-complete' file? Default is
>false.
>require-download-complete=true
>
># unqualified extractor class names are prefixed by
>org.dbpedia.extraction.mappings.
>
># All 111 languages that as of 2012-05-25 have 10000 articles or more.
># TODO: parse wikipedias.csv and figure out from there which languages to
>extract.
># If no languages are given, the ones having a mapping namespace on
>mappings.dbpedia.org are used
>languages=de
>
>extractors=InfoboxExtractor
>#ArticleCategoriesExtractor,CategoryLabelExtractor,ExternalLinksExtractor,\
>#GeoExtractor,InfoboxExtractor,LabelExtractor,PageIdExtractor,PageLinksExtractor,\
>#RedirectExtractor,RevisionIdExtractor,SkosCategoriesExtractor,WikiPageExtractor
>
>extractors.de=InfoboxExtractor
>#extractors.de=MappingExtractor,DisambiguationExtractor,InterLanguageLinksExtractor,RedirectExtractor,LabelExtractor
>#extractors.en=MappingExtractor,DisambiguationExtractor,InterLanguageLinksExtractor,RedirectExtractor,LabelExtractor
>
># if ontology and mapping files are not given or do not exist, download info
>from mappings.dbpedia.org
>ontology=../ontology.xml
>mappings=../mappings
>
># URI policies. Allowed flags: uri, generic, xml-safe. Each flag may have on
>of the suffixes
># -subjects, -predicates, -objects, -datatype, -context to match only URIs in
>a certain position.
># Without a suffix, a flag matches all URI positions.
>
>uri-policy.uri=uri:en; generic:en; xml-safe-predicates:*
>uri-policy.iri=generic:en; xml-safe-predicates:*
>
>
># File formats. Allowed flags: n-triples, n-quads, turtle-triples,
>turtle-quads, trix-triples, trix-quads
># May be followed by a semicolon and a URI policy name. If format name ends
>with .gz or .bz2, files
># are zipped on the fly.
>
># NT is unreadable anyway - might as well use URIs
>format.nt=n-triples;uri-policy.uri
>#format.nq.gz=n-quads;uri-policy.uri
>
># Turtle is much more readable - use nice IRIs
>format.ttl=turtle-triples;uri-policy.iri
>#format.tql.gz=turtle-quads;uri-policy.iri
>"
>
>--
>Regards
>Gaurav Pant
>+91-7709196607,+91-9405757794
>
>
>------------------------------------------------------------------------------
>Everyone hates slow websites. So do we.
>Make your web apps faster with AppDynamics
>Download AppDynamics Lite for free today:
>http://p.sf.net/sfu/appdyn_d2d_feb
>_______________________________________________
>Dbpedia-discussion mailing list
>[email protected]
>https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>
>
>
--
Regards
Gaurav Pant
+91-7709196607,+91-9405757794
------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_feb
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion