Marvin Lugair wrote:
> Hello,
>
> I would like to report back on my loading of dbpedia 3.2 into Open-Source
> Virtuoso 5.0.9.
> The good news is that I was successful and have a local DBPedia to play with
> now. Thanks to everyone for their input and suggestions on configuration
> parameters!
>
> Marv
>
> ----------------
>
> Running Ubuntu 8.1 (intrepid)
> Kernel 2.6.27-7
> 8GB DDR2 RAM
> AMD Athlon 2.5ghz Dual core
>
> It took around 22 hours to import the core (21 files) and make a .db database
> file out of them. The imported resulted in one dbpedia.db file that is about
> 20-something GB in size.
> It typically takes a little an hour to start that database (load the .db file
> in memory) and start the virtuoso process.
> As a reference:
> Time to load infobox_en.nt = 52 minutes
>
>
> Some of the parameters in my dbpedia.ini
>
> MaxCheckpointRemap = 1000000
> MaxMemPoolSize = 0
> StopCompilerWhenXOverRunTime = 1
> DefaultIsolation = 2
> NumberOfBuffers = 550000
> MaxDirtyBuffers = 320000
>
>
> Files that had errors
> ---------------
> Three files did not load because of malformed URIs (about 500 of them across
> the three files, 400-something lines were in the externallinks file). I tried
> to reload these files with the ttlp_mt bit mask that ignores errors but it
> did not work.
> I deleted the corresponding triples and reloaded. Bascially you lose those
> triples. Someone needs to fix these in the DBPedia files.
>
>
> The three files with errors are:
> 1> homepage_en.nt
> 2> externallinks_en.nt
> 3> infobox-mappingbased-loose.nt
> The URI's either had spaces, backslashes or even Korean characters (in one
> case) in them. These files need cleaning up.
>
>
>
>
> Some questions
> ---------------------------------------
> * Why does short-abstracts take 4 hours to load though it is 982MB
> whereas long-abstracts took 2 hours to load though its size is 1.7 gigs?!
> The only difference is that short was loaded a few files after long... does
> performance change as the database file (the one i am creating, dbpedia.db)
> grows larger?
>
> * What is the best way to check for and delete duplicate triples in the
> database?
>
> * Related to this last question, it seems the online dbpedia at
> dbpedia.org/sparql gateway does not return duplicates over the webpage
> interface. However it does return duplicates for the SAME query when
> submitted through Jena. To duplicate this paste the following query in the
> webpage:
>
> select ?s
> where {
> ?s
> <http://dbpedia.org/property/influenced>
> <http://dbpedia.org/resource/Chris_Rock>
> }
>
> This will return the following results in my web browser:
> http://dbpedia.org/resource/Bill_Cosby
> http://dbpedia.org/resource/Dick_Gregory
> http://dbpedia.org/resource/Eddie_Murphy
> http://dbpedia.org/resource/Flip_Wilson
> http://dbpedia.org/resource/George_Carlin
> http://dbpedia.org/resource/Mort_Sahl
> http://dbpedia.org/resource/Redd_Foxx
> http://dbpedia.org/resource/Richard_Pryor
> http://dbpedia.org/resource/Rodney_Dangerfield
> http://dbpedia.org/resource/Sam_Kinison
> http://dbpedia.org/resource/Steve_Martin
>
>
> no duplicates,
> Now run the *same* query through a Jena program
> In my java source here is how I am connecting to what I assume is the SAME
> gateway!
> QueryExecution qexec =
> QueryExecutionFactory.sparqlService("http://DBpedia.org/sparql", q);
>
> and here is what i get (again this is the exact same query):
>
> ----------------------------------------------------
> | s |
> ====================================================
> | <http://dbpedia.org/resource/Bill_Cosby> |
> | <http://dbpedia.org/resource/Dick_Gregory> |
> | <http://dbpedia.org/resource/Eddie_Murphy> |
> | <http://dbpedia.org/resource/Flip_Wilson> |
> | <http://dbpedia.org/resource/George_Carlin> |
> | <http://dbpedia.org/resource/Mort_Sahl> |
> | <http://dbpedia.org/resource/Redd_Foxx> |
> | <http://dbpedia.org/resource/Richard_Pryor> |
> | <http://dbpedia.org/resource/Rodney_Dangerfield> |
> | <http://dbpedia.org/resource/Sam_Kinison> |
> | <http://dbpedia.org/resource/Steve_Martin> |
> | <http://dbpedia.org/resource/Bill_Cosby> |
> | <http://dbpedia.org/resource/Bill_Cosby> |
> | <http://dbpedia.org/resource/Dick_Gregory> |
> | <http://dbpedia.org/resource/Eddie_Murphy> |
> | <http://dbpedia.org/resource/Flip_Wilson> |
> | <http://dbpedia.org/resource/George_Carlin> |
> | <http://dbpedia.org/resource/Mort_Sahl> |
> | <http://dbpedia.org/resource/Redd_Foxx> |
> | <http://dbpedia.org/resource/Richard_Pryor> |
> | <http://dbpedia.org/resource/Rodney_Dangerfield> |
> | <http://dbpedia.org/resource/Sam_Kinison> |
> | <http://dbpedia.org/resource/Steve_Martin> |
> | <http://dbpedia.org/resource/Eddie_Murphy> |
> ----------------------------------------------------
>
> Duplicates!
> Can someone please explain this?
>
> As a side, when I run this from isql on my newly locally installed dbpedia I
> get no duplicates (I havent tried Jena with my local).
>
>
> <eom>
>
>
>
>
>
> -------------------------------------------------------------------------
> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
> Build the coolest Linux based applications with Moblin SDK & win great prizes
> Grand prize is a trip for two to an Open Source event anywhere in the world
> http://moblin-contest.org/redirect.php?banner_id=100&url=/
> _______________________________________________
> Dbpedia-discussion mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>
>
Marvin,
You will see why when you run:
select *
where {graph ?g {
?s
<http://dbpedia.org/property/influenced>
<http://dbpedia.org/resource/Chris_Rock>
}}
As you can see their are two graphs:
1. http://dbpedia.org
2. http://dbpedia.org/resource/<entity> (this one results from cache
activity associated with client interactions with Virtuoso)
Solutions:
-- Being specific about source Graph by specifying Graph IRI
select ?s
where {graph <http://dbpedia.org> {
?s
<http://dbpedia.org/property/influenced>
<http://dbpedia.org/resource/Chris_Rock>
}}
OR
select ?s
from <http://dbpedia.org>
where {
?s
<http://dbpedia.org/property/influenced>
<http://dbpedia.org/resource/Chris_Rock>
}
-- Using DISTINCT
select distinct ?s
where {
?s
<http://dbpedia.org/property/influenced>
<http://dbpedia.org/resource/Chris_Rock>
}
--
Regards,
Kingsley Idehen Weblog: http://www.openlinksw.com/blog/~kidehen
President & CEO
OpenLink Software Web: http://www.openlinksw.com
-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion