Thank you all,

I'm beginning to get it now. That first hurdle was intense!

Right now I'm creating a new database from scratch and re-loading the datasets again, to be sure I got everything right.

When dbpedia publishes updated datasets, Hugh, does your load_dbpedia.sh script update new/changed entries, or does it re- import everything again? I noticed the script moved the completed dataset files into a 'READY' folder, which is a nice touch, but I'm just wondering how I should go about future loads: should I worry about having to delete the database and rebuilding from scratch, or can I just update in place with the same .sh scripts?

That said, I also modified the .ini file to Christian Becker's recommended parameters: http://www4.wiwiss.fu-berlin.de/benchmarks-200801/#rdfstores (although I changed them a little as my specs are slightly different).

I noticed before adjusting the parameters that my CPU load will crank up nicely at the beginning, and then dwindle down to 10-15% and paging to the disk furiously towards the end. Hopefully with the adjustments the load will be able to fully utilize everything my little machine has to offer. Is this to be expected?


Cheers,
Andrew



On Mar 5, 2008, at 10:11 AM, Hugh Williams wrote:

Andrew,

On 05/03/2008 00:26, "Andrew (Chuan) Khoo" <[EMAIL PROTECTED]> wrote:

Hi Hugh, Omid, all,

Hugh: You beat me to the punch there, I was typing my reply of how I finally got the datasets loading (yes, i had to edit the isql port numbers from 1118 in your .sh files to 1111) when I got your mail. Thanks so much! I compiled wget from the http://www.gnu.org/software/wget/ and that should work – the loading is still chugging along, and should be finished in a hour or so.

I have a few more questions for Hugh, and everyone. I know I'm being a pain, but please bear with me while I work this out.:

1)
load_dbpedia.sh had this line:

exec="ttlp_mt (file_to_string_output ('$f'), '', 'http:// dbpedia.org');" > temp.res

as well as this one:

exec="sparql select count(*) where { ?s <http://dbpedia.org/property/wordnet_type > ?o};" > temp.res

Should I be changing the graph URI to something else? How do I do that? I am still stumped by the massiveness of Virtuoso, and I'm not sure how to access the dbpedia datasets once they are loaded. I know I can call sparql queries via http://localhost:8890/sparql, but should I be concerned with defining the name of the graph uri?

[Hugh] the Graph-URI is the data source name, and is <http://dbpedia.org > based on the ttlp_mt() usage above. If you are seeking to expose this data on the Web, then you can create URL re-write rules that map Virtual Directories to SPARQL DESCRIBE/CONSTRUCT queries as detailed at:

    
http://virtuoso.openlinksw.com/Whitepapers/html/VirtLinkedDataDeployment.html#URL-Rewriting

This is how Linked Data deployment occurs. Thus, you will SPARQL against <http://dbpedia.org> internally in response to the URI scheme used to publish this data. I would suggest you read the entire “Virtuoso Linked Data Deployment” White paper referenced above for details on the process.


Will you also be able to give some really quick examples on how to call sparql queries on the local dataset? I expect them to run the same as dbpedia.org's public query page (http://dbpedia.org/ sparql), but just in case.

[Hugh] The following URL give sample queries for use with Virtuoso and other query tools:

    http://wiki.dbpedia.org/OnlineAccess#h28-7

The following Dbpedia benchmark results also provides additional queries:

    http://www4.wiwiss.fu-berlin.de/benchmarks-200801/


2)
If I am importing just shortabstracts_en, articles_label_en and image_en, does the census processing portion of post_install.sh still apply to my case?

[Hugh] This section of the script was specifically added for out in- house test case and can be skipped if you do not require census data.

3)
Eventually I intend to call the sparql queries via php5 or java. Is there a quick way to do it from the Virtuoso Open Source server, or should I look at alternatives like Perl? My main application is a local Java applet that will send out the queries, and read back the results for further processing.

[Hugh] SPARQL Queries are best executed against a SPARQL Endpoint via the SPARQL Protocol (a REST or SOAP Web Service), which would be of the form http://<hostname>:<port>/sparql in the case of Virtuoso. Although as Fred suggests in his post you can also execute SPARQL queries against Virtuoso via ODBC or JDBC directly. Virtuoso also includes RDF Data providers for Sesame and Jena enabling these frames to query the Virtuoso Quad store:

    http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/VOSRDFDataProviders

Thank you.

Omid, to your response: There were a number of main issues I encountered, and I still have problems and questions myself, but I can provide the following points. I'm still waiting for the datasets to finish loading, and in time, when I have everything running smoothly I will post a more concise list here:

1) Read Virtuoso's README file, especially re: the required libraries before you start configuring and building the binaries. I compiled it on Leopard (10.5.2), and it works. Took a long time though, about 1.5 hours to get it all compiled and installed. Stop Leopard's Apache server (if it's running) if you are concerned in it conflicting with the build, but it should be ok to leave it on – Virtuoso sets itself up at localhost:8890 by default.

2) To set up the Virtuoso server, you can copy out the virtuoso.ini file (located in /yourvirtuosoinstallprefix/var/lib/virtuoso/db) to a location where you want your database stored, rename that to dbpedia.ini and then type "sudo virtuoso-t -c dbpedia.ini -f " (excluding quotes; I also omitted the & at the end as I wanted to see in Terminal the status messages of the server while it starts up).

3) Unpack the .sh script files Hugh kindly provided. You will have to go into the .sh files to modify the port number. For me, Virtuoso set the default port of 1111 and so I had to update that in the .sh files before I could execute them successfully.

4) And check the points Hugh made below.

[Hugh] Note we are going to be producing a public document on loading the Dbpedia datasets in Virtuoso, which I shall post the link to on this mailing list when available.

Regards
Hugh


Cheers,
Andrew



-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to