Peter Ansell wrote:
> 2008/11/23 Kingsley Idehen <[EMAIL PROTECTED] 
> <mailto:[EMAIL PROTECTED]>>
>
>     Marvin Lugair wrote:
>     > Hello,
>     >
>     > I would like to report back on my loading of dbpedia 3.2 into
>     Open-Source Virtuoso 5.0.9. <http://5.0.9.>
>     > The good news is that I was successful and have a local DBPedia
>     to play with now. Thanks to everyone for their input and
>     suggestions on configuration parameters!
>     >
>     > Marv
>     >
>     > ----------------
>     >
>     > Running Ubuntu 8.1 (intrepid)
>     > Kernel 2.6.27-7
>     > 8GB DDR2 RAM
>     > AMD Athlon 2.5ghz Dual core
>     >
>     > It took around 22 hours to import the core (21 files) and make a
>     .db database file out of them. The imported resulted in one
>     dbpedia.db file that is about 20-something GB in size.
>     > It typically takes a little an hour to start that database (load
>     the .db file in memory) and start the virtuoso process.
>     > As a reference:
>     > Time to load infobox_en.nt = 52 minutes
>     >
>     >
>     > Some of the parameters in my dbpedia.ini
>     >
>     > MaxCheckpointRemap              = 1000000
>     > MaxMemPoolSize                  = 0
>     > StopCompilerWhenXOverRunTime    = 1
>     > DefaultIsolation                = 2
>     > NumberOfBuffers                 = 550000
>     > MaxDirtyBuffers                 = 320000
>     >
>     >
>     > Files that had errors
>     > ---------------
>     > Three files did not load because of malformed URIs (about 500 of
>     them across the three files, 400-something lines were in the
>     externallinks file). I tried to reload these files with the
>     ttlp_mt bit mask that ignores errors but it did not work.
>     > I deleted the corresponding triples and reloaded. Bascially you
>     lose those triples. Someone needs to fix these in the DBPedia files.
>     >
>     >
>     > The three files with errors are:
>     >  1> homepage_en.nt
>     >  2> externallinks_en.nt
>     >  3> infobox-mappingbased-loose.nt
>     > The URI's either had spaces, backslashes or even Korean
>     characters (in one case) in them. These files need cleaning up.
>     >
>     >
>     >
>     >
>     > Some questions
>     > ---------------------------------------
>     > * Why does short-abstracts take 4 hours to load though it is 982MB
>     > whereas long-abstracts took 2 hours to load though its size is
>     1.7 gigs?!
>     > The only difference is that short was loaded a few files after
>     long... does performance change as the database file (the one i am
>     creating, dbpedia.db) grows larger?
>     >
>     > * What is the best way to check for and delete duplicate triples
>     in the database?
>     >
>     > * Related to this last question, it seems the online dbpedia at
>     dbpedia.org/sparql <http://dbpedia.org/sparql> gateway does not
>     return duplicates over the webpage interface. However it does
>     return duplicates for the SAME query when submitted through Jena.
>     To duplicate this paste the following query in the webpage:
>     >
>     > select ?s
>     > where {
>     > ?s
>     >  <http://dbpedia.org/property/influenced>
>     > <http://dbpedia.org/resource/Chris_Rock>
>     > }
>     >
>     > This will return the following results in my web browser:
>     > http://dbpedia.org/resource/Bill_Cosby
>     > http://dbpedia.org/resource/Dick_Gregory
>     > http://dbpedia.org/resource/Eddie_Murphy
>     > http://dbpedia.org/resource/Flip_Wilson
>     > http://dbpedia.org/resource/George_Carlin
>     > http://dbpedia.org/resource/Mort_Sahl
>     > http://dbpedia.org/resource/Redd_Foxx
>     > http://dbpedia.org/resource/Richard_Pryor
>     > http://dbpedia.org/resource/Rodney_Dangerfield
>     > http://dbpedia.org/resource/Sam_Kinison
>     > http://dbpedia.org/resource/Steve_Martin
>     >
>     >
>     > no duplicates,
>     > Now run the *same* query through a Jena program
>     > In my java source here is how I am connecting to what I assume
>     is the SAME gateway!
>     >  QueryExecution qexec =
>     QueryExecutionFactory.sparqlService("http://DBpedia.org/sparql";, q);
>     >
>     > and here is what i get (again this is the exact same query):
>     >
>     > ----------------------------------------------------
>     > | s                                                |
>     > ====================================================
>     > | <http://dbpedia.org/resource/Bill_Cosby>         |
>     > | <http://dbpedia.org/resource/Dick_Gregory>       |
>     > | <http://dbpedia.org/resource/Eddie_Murphy>       |
>     > | <http://dbpedia.org/resource/Flip_Wilson>        |
>     > | <http://dbpedia.org/resource/George_Carlin>      |
>     > | <http://dbpedia.org/resource/Mort_Sahl>          |
>     > | <http://dbpedia.org/resource/Redd_Foxx>          |
>     > | <http://dbpedia.org/resource/Richard_Pryor>      |
>     > | <http://dbpedia.org/resource/Rodney_Dangerfield> |
>     > | <http://dbpedia.org/resource/Sam_Kinison>        |
>     > | <http://dbpedia.org/resource/Steve_Martin>       |
>     > | <http://dbpedia.org/resource/Bill_Cosby>         |
>     > | <http://dbpedia.org/resource/Bill_Cosby>         |
>     > | <http://dbpedia.org/resource/Dick_Gregory>       |
>     > | <http://dbpedia.org/resource/Eddie_Murphy>       |
>     > | <http://dbpedia.org/resource/Flip_Wilson>        |
>     > | <http://dbpedia.org/resource/George_Carlin>      |
>     > | <http://dbpedia.org/resource/Mort_Sahl>          |
>     > | <http://dbpedia.org/resource/Redd_Foxx>          |
>     > | <http://dbpedia.org/resource/Richard_Pryor>      |
>     > | <http://dbpedia.org/resource/Rodney_Dangerfield> |
>     > | <http://dbpedia.org/resource/Sam_Kinison>        |
>     > | <http://dbpedia.org/resource/Steve_Martin>       |
>     > | <http://dbpedia.org/resource/Eddie_Murphy>       |
>     > ----------------------------------------------------
>     >
>     > Duplicates!
>     > Can someone please explain this?
>     >
>     > As a side, when I run this from isql on my newly locally
>     installed dbpedia I get no duplicates (I havent tried Jena with my
>     local).
>     >
>     >
>     > <eom>
>     >
>     >
>     >
>     >
>     >
>     >
>     -------------------------------------------------------------------------
>     > This SF.Net email is sponsored by the Moblin Your Move
>     Developer's challenge
>     > Build the coolest Linux based applications with Moblin SDK & win
>     great prizes
>     > Grand prize is a trip for two to an Open Source event anywhere
>     in the world
>     > http://moblin-contest.org/redirect.php?banner_id=100&url=/
>     <http://moblin-contest.org/redirect.php?banner_id=100&url=/>
>     > _______________________________________________
>     > Dbpedia-discussion mailing list
>     > [email protected]
>     <mailto:[email protected]>
>     > https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>     >
>     >
>     Marvin,
>
>     You will see why when you run:
>
>     select *
>     where {graph ?g {
>     ?s
>      <http://dbpedia.org/property/influenced>
>     <http://dbpedia.org/resource/Chris_Rock>
>     }}
>
>
>     As you can see their are two graphs:
>     1. http://dbpedia.org
>     2. http://dbpedia.org/resource/<entity> (this one results from cache
>     activity associated with client interactions with Virtuoso)
>
>     Solutions:
>     -- Being specific about source Graph by specifying Graph IRI
>     select ?s
>     where {graph <http://dbpedia.org> {
>     ?s
>      <http://dbpedia.org/property/influenced>
>     <http://dbpedia.org/resource/Chris_Rock>
>     }}
>
>     OR
>
>     select ?s
>     from <http://dbpedia.org>
>     where {
>     ?s
>      <http://dbpedia.org/property/influenced>
>     <http://dbpedia.org/resource/Chris_Rock>
>     }
>
>     -- Using DISTINCT
>
>     select distinct ?s
>     where {
>     ?s
>      <http://dbpedia.org/property/influenced>
>     <http://dbpedia.org/resource/Chris_Rock>
>     }
>
>
> What is the instruction to give with Jena/Other clients etc. to make 
> it behave in the same way as the HTTP SPARQL page interface and not 
> resolve triples from the cache graphs.
>
> Cheers,
>
> Peter
>
Peter,
Qualify the GRAPH IRI in your query pattern using the examples above (or 
use DISTINCT as per example above).


In relation to our Jena Provider also look at:
http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/VirtJenaSPARQLExample2


If this isn't clear, just send a Jena code excerpt.


-- 


Regards,

Kingsley Idehen       Weblog: http://www.openlinksw.com/blog/~kidehen
President & CEO 
OpenLink Software     Web: http://www.openlinksw.com





-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to