Thank you all,
I'm beginning to get it now. That first hurdle was intense!
Right now I'm creating a new database from scratch and re-loading the
datasets again, to be sure I got everything right.
When dbpedia publishes updated datasets, Hugh, does your
load_dbpedia.sh script update new/changed entries, or does it re-
import everything again? I noticed the script moved the completed
dataset files into a 'READY' folder, which is a nice touch, but I'm
just wondering how I should go about future loads: should I worry
about having to delete the database and rebuilding from scratch, or
can I just update in place with the same .sh scripts?
That said, I also modified the .ini file to Christian Becker's
recommended parameters: http://www4.wiwiss.fu-berlin.de/benchmarks-200801/#rdfstores
(although I changed them a little as my specs are slightly different).
I noticed before adjusting the parameters that my CPU load will crank
up nicely at the beginning, and then dwindle down to 10-15% and paging
to the disk furiously towards the end. Hopefully with the adjustments
the load will be able to fully utilize everything my little machine
has to offer. Is this to be expected?
Cheers,
Andrew
On Mar 5, 2008, at 10:11 AM, Hugh Williams wrote:
Andrew,
On 05/03/2008 00:26, "Andrew (Chuan) Khoo" <[EMAIL PROTECTED]> wrote:
Hi Hugh, Omid, all,
Hugh: You beat me to the punch there, I was typing my reply of how
I finally got the datasets loading (yes, i had to edit the isql
port numbers from 1118 in your .sh files to 1111) when I got your
mail. Thanks so much! I compiled wget from the http://www.gnu.org/software/wget/
and that should work – the loading is still chugging along, and
should be finished in a hour or so.
I have a few more questions for Hugh, and everyone. I know I'm
being a pain, but please bear with me while I work this out.:
1)
load_dbpedia.sh had this line:
exec="ttlp_mt (file_to_string_output ('$f'), '', 'http://
dbpedia.org');" > temp.res
as well as this one:
exec="sparql select count(*) where { ?s <http://dbpedia.org/property/wordnet_type
> ?o};" > temp.res
Should I be changing the graph URI to something else? How do I do
that? I am still stumped by the massiveness of Virtuoso, and I'm
not sure how to access the dbpedia datasets once they are loaded. I
know I can call sparql queries via http://localhost:8890/sparql,
but should I be concerned with defining the name of the graph uri?
[Hugh] the Graph-URI is the data source name, and is <http://dbpedia.org
> based on the ttlp_mt() usage above. If you are seeking to expose
this data on the Web, then you can create URL re-write rules that
map Virtual Directories to SPARQL DESCRIBE/CONSTRUCT queries as
detailed at:
http://virtuoso.openlinksw.com/Whitepapers/html/VirtLinkedDataDeployment.html#URL-Rewriting
This is how Linked Data deployment occurs. Thus, you will SPARQL
against <http://dbpedia.org> internally in response to the URI
scheme used to publish this data. I would suggest you read the
entire “Virtuoso Linked Data Deployment” White paper referenced
above for details on the process.
Will you also be able to give some really quick examples on how to
call sparql queries on the local dataset? I expect them to run the
same as dbpedia.org's public query page (http://dbpedia.org/
sparql), but just in case.
[Hugh] The following URL give sample queries for use with Virtuoso
and other query tools:
http://wiki.dbpedia.org/OnlineAccess#h28-7
The following Dbpedia benchmark results also provides additional
queries:
http://www4.wiwiss.fu-berlin.de/benchmarks-200801/
2)
If I am importing just shortabstracts_en, articles_label_en and
image_en, does the census processing portion of post_install.sh
still apply to my case?
[Hugh] This section of the script was specifically added for out in-
house test case and can be skipped if you do not require census data.
3)
Eventually I intend to call the sparql queries via php5 or java. Is
there a quick way to do it from the Virtuoso Open Source server, or
should I look at alternatives like Perl? My main application is a
local Java applet that will send out the queries, and read back the
results for further processing.
[Hugh] SPARQL Queries are best executed against a SPARQL Endpoint
via the SPARQL Protocol (a REST or SOAP Web Service), which would
be of the form http://<hostname>:<port>/sparql in the case of
Virtuoso. Although as Fred suggests in his post you can also
execute SPARQL queries against Virtuoso via ODBC or JDBC directly.
Virtuoso also includes RDF Data providers for Sesame and Jena
enabling these frames to query the Virtuoso Quad store:
http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/VOSRDFDataProviders
Thank you.
Omid, to your response: There were a number of main issues I
encountered, and I still have problems and questions myself, but I
can provide the following points. I'm still waiting for the
datasets to finish loading, and in time, when I have everything
running smoothly I will post a more concise list here:
1) Read Virtuoso's README file, especially re: the required
libraries before you start configuring and building the binaries. I
compiled it on Leopard (10.5.2), and it works. Took a long time
though, about 1.5 hours to get it all compiled and installed. Stop
Leopard's Apache server (if it's running) if you are concerned in
it conflicting with the build, but it should be ok to leave it on –
Virtuoso sets itself up at localhost:8890 by default.
2) To set up the Virtuoso server, you can copy out the virtuoso.ini
file (located in /yourvirtuosoinstallprefix/var/lib/virtuoso/db) to
a location where you want your database stored, rename that to
dbpedia.ini and then type "sudo virtuoso-t -c dbpedia.ini -f
" (excluding quotes; I also omitted the & at the end as I wanted to
see in Terminal the status messages of the server while it starts
up).
3) Unpack the .sh script files Hugh kindly provided. You will have
to go into the .sh files to modify the port number. For me,
Virtuoso set the default port of 1111 and so I had to update that
in the .sh files before I could execute them successfully.
4) And check the points Hugh made below.
[Hugh] Note we are going to be producing a public document on
loading the Dbpedia datasets in Virtuoso, which I shall post the
link to on this mailing list when available.
Regards
Hugh
Cheers,
Andrew
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion