Re: [Dbpedia-discussion] setting up DBpedia on local machine?]

Andrew (Chuan) Khoo Wed, 05 Mar 2008 15:59:06 -0800

Thank you all,

I'm beginning to get it now. That first hurdle was intense!

Right now I'm creating a new database from scratch and re-loading thedatasets again, to be sure I got everything right.

When dbpedia publishes updated datasets, Hugh, does yourload_dbpedia.sh script update new/changed entries, or does it re-import everything again? I noticed the script moved the completeddataset files into a 'READY' folder, which is a nice touch, but I'mjust wondering how I should go about future loads: should I worryabout having to delete the database and rebuilding from scratch, orcan I just update in place with the same .sh scripts?

That said, I also modified the .ini file to Christian Becker'srecommended parameters: http://www4.wiwiss.fu-berlin.de/benchmarks-200801/#rdfstores(although I changed them a little as my specs are slightly different).

I noticed before adjusting the parameters that my CPU load will crankup nicely at the beginning, and then dwindle down to 10-15% and pagingto the disk furiously towards the end. Hopefully with the adjustmentsthe load will be able to fully utilize everything my little machinehas to offer. Is this to be expected?



Cheers,
Andrew



On Mar 5, 2008, at 10:11 AM, Hugh Williams wrote:

Andrew,

On 05/03/2008 00:26, "Andrew (Chuan) Khoo" <[EMAIL PROTECTED]> wrote:
Hi Hugh, Omid, all,
Hugh: You beat me to the punch there, I was typing my reply of howI finally got the datasets loading (yes, i had to edit the isqlport numbers from 1118 in your .sh files to 1111) when I got yourmail. Thanks so much! I compiled wget from the http://www.gnu.org/software/wget/and that should work – the loading is still chugging along, andshould be finished in a hour or so.
I have a few more questions for Hugh, and everyone. I know I'mbeing a pain, but please bear with me while I work this out.:
1)
load_dbpedia.sh had this line:
exec="ttlp_mt (file_to_string_output ('$f'), '', 'http://dbpedia.org');" > temp.res
as well as this one:
exec="sparql select count(*) where { ?s <http://dbpedia.org/property/wordnet_type> ?o};" > temp.res
Should I be changing the graph URI to something else? How do I dothat? I am still stumped by the massiveness of Virtuoso, and I'mnot sure how to access the dbpedia datasets once they are loaded. Iknow I can call sparql queries via http://localhost:8890/sparql,but should I be concerned with defining the name of the graph uri?
[Hugh] the Graph-URI is the data source name, and is <http://dbpedia.org> based on the ttlp_mt() usage above. If you are seeking to exposethis data on the Web, then you can create URL re-write rules thatmap Virtual Directories to SPARQL DESCRIBE/CONSTRUCT queries asdetailed at:
    
http://virtuoso.openlinksw.com/Whitepapers/html/VirtLinkedDataDeployment.html#URL-Rewriting
This is how Linked Data deployment occurs. Thus, you will SPARQLagainst <http://dbpedia.org> internally in response to the URIscheme used to publish this data. I would suggest you read theentire “Virtuoso Linked Data Deployment” White paper referencedabove for details on the process.
Will you also be able to give some really quick examples on how tocall sparql queries on the local dataset? I expect them to run thesame as dbpedia.org's public query page (http://dbpedia.org/sparql), but just in case.
[Hugh] The following URL give sample queries for use with Virtuosoand other query tools:
    http://wiki.dbpedia.org/OnlineAccess#h28-7
The following Dbpedia benchmark results also provides additionalqueries:
    http://www4.wiwiss.fu-berlin.de/benchmarks-200801/


2)
If I am importing just shortabstracts_en, articles_label_en andimage_en, does the census processing portion of post_install.shstill apply to my case?
[Hugh] This section of the script was specifically added for out in-house test case and can be skipped if you do not require census data.
3)
Eventually I intend to call the sparql queries via php5 or java. Isthere a quick way to do it from the Virtuoso Open Source server, orshould I look at alternatives like Perl? My main application is alocal Java applet that will send out the queries, and read back theresults for further processing.
[Hugh] SPARQL Queries are best executed against a SPARQL Endpointvia the SPARQL Protocol (a REST or SOAP Web Service), which wouldbe of the form http://<hostname>:<port>/sparql in the case ofVirtuoso. Although as Fred suggests in his post you can alsoexecute SPARQL queries against Virtuoso via ODBC or JDBC directly.Virtuoso also includes RDF Data providers for Sesame and Jenaenabling these frames to query the Virtuoso Quad store:
    http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/VOSRDFDataProviders

Thank you.
Omid, to your response: There were a number of main issues Iencountered, and I still have problems and questions myself, but Ican provide the following points. I'm still waiting for thedatasets to finish loading, and in time, when I have everythingrunning smoothly I will post a more concise list here:
1) Read Virtuoso's README file, especially re: the requiredlibraries before you start configuring and building the binaries. Icompiled it on Leopard (10.5.2), and it works. Took a long timethough, about 1.5 hours to get it all compiled and installed. StopLeopard's Apache server (if it's running) if you are concerned init conflicting with the build, but it should be ok to leave it on –Virtuoso sets itself up at localhost:8890 by default.
2) To set up the Virtuoso server, you can copy out the virtuoso.inifile (located in /yourvirtuosoinstallprefix/var/lib/virtuoso/db) toa location where you want your database stored, rename that todbpedia.ini and then type "sudo virtuoso-t -c dbpedia.ini -f" (excluding quotes; I also omitted the & at the end as I wanted tosee in Terminal the status messages of the server while it startsup).
3) Unpack the .sh script files Hugh kindly provided. You will haveto go into the .sh files to modify the port number. For me,Virtuoso set the default port of 1111 and so I had to update thatin the .sh files before I could execute them successfully.
4) And check the points Hugh made below.
[Hugh] Note we are going to be producing a public document onloading the Dbpedia datasets in Virtuoso, which I shall post thelink to on this mailing list when available.
Regards
Hugh


Cheers,
Andrew

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/

_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Re: [Dbpedia-discussion] setting up DBpedia on local machine?]

Reply via email to