Peter DeVries wrote:
I have installed the new version of Virtuoso Open Source and I seem to
be having the same crawling problem
with the following example (very preliminary) data sets:
It seems to access the first RDF page and then stop.
http://lod.taxonconcept.org/
http://ocs.geospecies.org/
The RDF starts at "index.rdf" http://ocs.geospecies.org/index.rdf
but even if I specify that URI, I get the same behavior.
Do you have sitemaps with Semantic Extensions now? If so, lets have the
URLs.
There is a new crawler that also allows you to designate crawl paths etc..
Kingsley
Suggestions?
- Pete
On Tue, Dec 29, 2009 at 10:56 AM, Hugh Williams
<[email protected] <mailto:[email protected]>> wrote:
Hi Peter,
This issue with the Virtuoso Crawler has been recreated and is
scheduled to be fixed for the next release. A short term
workaround would be to query the Virtuoso SPARQL endpoint
(/sparql) with the "Retrieve remote RDF data for all missing
source graphs" option (get:soft pragma) set, for example:
http://demo.openlinksw.com/sparql?default-graph-uri=http%3A%2F%2Flod.geospecies.org%2Findex.rdf&should-sponge=soft&query=select+*+where+{%3Fs+%3Fp+%3Fo}&format=text%2Fhtml&debug=on
<http://demo.openlinksw.com/sparql?default-graph-uri=http%3A%2F%2Flod.geospecies.org%2Findex.rdf&should-sponge=soft&query=select+*+where+%7B%3Fs+%3Fp+%3Fo%7D&format=text%2Fhtml&debug=on>
Or use some of other Virtuoso pragma options to tailor your query
as required. Further details on IRI de-referencing and use of
pragmas can be obtained from:
http://docs.openlinksw.com/virtuoso/rdfiridereferencing.html
Best Regards
Hugh Williams
Professional Services
OpenLink Software
Web: http://www.openlinksw.com
Support: http://support.openlinksw.com
Forums: http://boards.openlinksw.com/support
Twitter: http://twitter.com/OpenLink
On 28 Dec 2009, at 18:16, Peter DeVries wrote:
Hi!
I have installed the latest Virtuoso open source and I am having
trouble getting it to crawl my data set.
The crawler downloads the first nine pages but then stops.
This happens when the target is http://lod.geospecies.org/ or
http://lod.geospecies.org/index.rdf.
The same dataset (rdf pages) can be successfully crawled with Elmo.
I think that this has something to do with a preference for
crawling the .xhtml pages rather than the rdf pages.
Is there something I should be including in the crawler input screen?
Thanks!
- Pete
----------------------------------------------------------------
Pete DeVries
Department of Entomology
University of Wisconsin - Madison
445 Russell Laboratories
1630 Linden Drive
Madison, WI 53706
GeoSpecies Knowledge Base
About the GeoSpecies Knowledge Base
------------------------------------------------------------
------------------------------------------------------------------------------
This SF.Net <http://SF.Net> email is sponsored by the Verizon
Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution
fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev
_______________________________________________
Virtuoso-users mailing list
[email protected]
<mailto:[email protected]>
https://lists.sourceforge.net/lists/listinfo/virtuoso-users
--
----------------------------------------------------------------
Pete DeVries
Department of Entomology
University of Wisconsin - Madison
445 Russell Laboratories
1630 Linden Drive
Madison, WI 53706
GeoSpecies Knowledge Base
About the GeoSpecies Knowledge Base
------------------------------------------------------------
------------------------------------------------------------------------
------------------------------------------------------------------------------
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
------------------------------------------------------------------------
_______________________________________________
Virtuoso-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/virtuoso-users
--
Regards,
Kingsley Idehen
President & CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen