Hi Dimitris, hi all,
Ive been quite busy the last few months, so now that Ive finally found
some spare time, Im taking another shot at DBpedia extraction for the
Serbian language.
Im making some progress with the latest version of the extraction
framework (I was pretty much stuck at the beginning with the previous
one). Ive successfully downloaded the latest Wikipedia dump, but Im
having problems extracting any triples from it. Heres what I have in my
extraction config file (comments removed for clarity):
-------------------------------------------------------------
base-dir=/home/uros/dbpedia/dumps
source=srwiki/20131009/srwiki-20131009-pages-articles.xml.bz2
languages=sr
extractors.sr=LabelExtractor
ontology=../ontology.xml
mappings=../mappings
uri-policy.uri=uri:en; generic:en; xml-safe-predicates:*
uri-policy.iri=generic:en; xml-safe-predicates:*
format.nt.gz=n-triples;uri-policy.uri
format.nq.gz=n-quads;uri-policy.uri
format.ttl.gz=turtle-triples;uri-policy.iri
format.tql.gz=turtle-quads;uri-policy.iri
--------------------------------------------------------------
The extraction framework is located in
/home/uros/dbpedia/extraction-framework.
The dump is inside /home/uros/dbpedia/dumps/srwiki/20131009.
As for the 'source' parameter, Ive tried including both the absolute path
and the one relative to the base-dir (but also leaving it out completely),
and Ive tried with and without the srwiki-yyyymmdd and sr: prefix, but
all to no avail. Also, Im trying a single extractor for starters, hoping
Ill get at least something (Ill fine tune it later).
When I run the extraction script with the above config file, I get the
following:
[INFO] Scanning for projects...
[INFO]
------------------------------------------------------------------------
[INFO] Building DBpedia Dump Extraction
[INFO] task-segment: [scala:run]
[INFO]
------------------------------------------------------------------------
[INFO] Preparing scala:run
[INFO] [resources:resources {execution: default-resources}]
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory
/home/uros/dbpedia/extraction-framework/dump/src/main/resources
[INFO] [scala:compile {execution: process-resources}]
[INFO] Checking for multiple versions of scala
[INFO] includes = [**/*.scala,**/*.java,]
[INFO] excludes = []
[INFO] Nothing to compile - all classes are up to date
[INFO] [compiler:compile {execution: default-compile}]
[INFO] Nothing to compile - all classes are up to date
[INFO] [scala:compile {execution: compile}]
[INFO] Checking for multiple versions of scala
[INFO] includes = [**/*.scala,**/*.java,]
[INFO] excludes = []
[INFO] Nothing to compile - all classes are up to date
[INFO] [resources:testResources {execution: default-testResources}]
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory
/home/uros/dbpedia/extraction-framework/dump/src/test/resources
[INFO] [compiler:testCompile {execution: default-testCompile}]
[INFO] No sources to compile
[INFO] [scala:testCompile {execution: test-compile}]
[INFO] Checking for multiple versions of scala
[INFO] includes = [**/*.scala,**/*.java,]
[INFO] excludes = []
[WARNING] No source files found.
[INFO] [scala:run {execution: default-cli}]
[INFO] Checking for multiple versions of scala
[WARNING] Not mainClass or valid launcher found/define
[INFO]
------------------------------------------------------------------------
[INFO] BUILD SUCCESSFUL
[INFO]
------------------------------------------------------------------------
[INFO] Total time: 2 seconds
[INFO] Finished at: Fri Oct 18 14:12:49 CEST 2013
[INFO] Final Memory: 25M/455M
[INFO]
------------------------------------------------------------------------
No changes in the target extraction dir, though. Any help would be much
appreciated!
Best regards,
Uro
------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk
_______________________________________________
Dbpedia-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-developers