The taxonconcept.org demo site has a sitemap file at http://lod.taxonconcept.org/sitemap.xml
<http://lod.taxonconcept.org/sitemap.xml>The geospecies.org site has a sitemap it two form http://lod.geospecies.org/sitemap.gz <http://lod.geospecies.org/sitemap.gz>and http://lod.geospecies.org/sitemap.xml (big) <http://lod.geospecies.org/sitemap.xml>This site does not have one yet http://ocs.geospecies.org/ <http://ocs.geospecies.org/>FYI: The taxonconcept and ocs.geospecies are currently just tests and the ontology etc will change. I will look into crawling via the sitemap extension. - Pete On Thu, Mar 11, 2010 at 6:59 PM, Kingsley Idehen <[email protected]>wrote: > Peter DeVries wrote: > >> I have installed the new version of Virtuoso Open Source and I seem to be >> having the same crawling problem >> with the following example (very preliminary) data sets: >> >> It seems to access the first RDF page and then stop. >> >> http://lod.taxonconcept.org/ >> >> http://ocs.geospecies.org/ >> >> The RDF starts at "index.rdf" http://ocs.geospecies.org/index.rdf >> >> but even if I specify that URI, I get the same behavior. >> > > Do you have sitemaps with Semantic Extensions now? If so, lets have the > URLs. > > There is a new crawler that also allows you to designate crawl paths etc.. > > Kingsley > >> >> Suggestions? >> >> - Pete >> >> >> On Tue, Dec 29, 2009 at 10:56 AM, Hugh Williams >> <[email protected]<mailto: >> [email protected]>> wrote: >> >> Hi Peter, >> >> This issue with the Virtuoso Crawler has been recreated and is >> scheduled to be fixed for the next release. A short term >> workaround would be to query the Virtuoso SPARQL endpoint >> (/sparql) with the "Retrieve remote RDF data for all missing >> source graphs" option (get:soft pragma) set, for example: >> >> >> http://demo.openlinksw.com/sparql?default-graph-uri=http%3A%2F%2Flod.geospecies.org%2Findex.rdf&should-sponge=soft&query=select+*+where+{%3Fs+%3Fp+%3Fo}&format=text%2Fhtml&debug=on<http://demo.openlinksw.com/sparql?default-graph-uri=http%3A%2F%2Flod.geospecies.org%2Findex.rdf&should-sponge=soft&query=select+*+where+%7B%3Fs+%3Fp+%3Fo%7D&format=text%2Fhtml&debug=on> >> < >> http://demo.openlinksw.com/sparql?default-graph-uri=http%3A%2F%2Flod.geospecies.org%2Findex.rdf&should-sponge=soft&query=select+*+where+%7B%3Fs+%3Fp+%3Fo%7D&format=text%2Fhtml&debug=on >> > >> >> >> Or use some of other Virtuoso pragma options to tailor your query >> as required. Further details on IRI de-referencing and use of >> pragmas can be obtained from: >> >> http://docs.openlinksw.com/virtuoso/rdfiridereferencing.html >> >> Best Regards >> Hugh Williams >> Professional Services >> OpenLink Software >> Web: http://www.openlinksw.com >> Support: http://support.openlinksw.com >> Forums: http://boards.openlinksw.com/support >> Twitter: http://twitter.com/OpenLink >> >> On 28 Dec 2009, at 18:16, Peter DeVries wrote: >> >> Hi! >>> >>> I have installed the latest Virtuoso open source and I am having >>> trouble getting it to crawl my data set. >>> >>> The crawler downloads the first nine pages but then stops. >>> >>> This happens when the target is http://lod.geospecies.org/ or >>> http://lod.geospecies.org/index.rdf. >>> >>> The same dataset (rdf pages) can be successfully crawled with Elmo. >>> >>> I think that this has something to do with a preference for >>> crawling the .xhtml pages rather than the rdf pages. >>> >>> Is there something I should be including in the crawler input screen? >>> >>> >>> Thanks! >>> >>> - Pete >>> >>> >>> >>> ---------------------------------------------------------------- >>> Pete DeVries >>> Department of Entomology >>> University of Wisconsin - Madison >>> 445 Russell Laboratories >>> 1630 Linden Drive >>> Madison, WI 53706 >>> GeoSpecies Knowledge Base >>> About the GeoSpecies Knowledge Base >>> ------------------------------------------------------------ >>> >>> >>> ------------------------------------------------------------------------------ >>> This SF.Net <http://SF.Net> email is sponsored by the Verizon >>> >>> Developer Community >>> Take advantage of Verizon's best-in-class app development support >>> A streamlined, 14 day to market process makes app distribution >>> fast and easy >>> Join now and get one step closer to millions of Verizon customers >>> http://p.sf.net/sfu/verizon-dev2dev >>> _______________________________________________ >>> Virtuoso-users mailing list >>> [email protected] >>> <mailto:[email protected]> >>> >>> https://lists.sourceforge.net/lists/listinfo/virtuoso-users >>> >> >> >> >> >> -- >> ---------------------------------------------------------------- >> Pete DeVries >> Department of Entomology >> University of Wisconsin - Madison >> 445 Russell Laboratories >> 1630 Linden Drive >> Madison, WI 53706 >> GeoSpecies Knowledge Base >> About the GeoSpecies Knowledge Base >> ------------------------------------------------------------ >> ------------------------------------------------------------------------ >> >> >> >> ------------------------------------------------------------------------------ >> Download Intel® Parallel Studio Eval >> Try the new software tools for yourself. Speed compiling, find bugs >> proactively, and fine-tune applications for parallel performance. >> See why Intel Parallel Studio got high marks during beta. >> http://p.sf.net/sfu/intel-sw-dev >> ------------------------------------------------------------------------ >> >> >> _______________________________________________ >> Virtuoso-users mailing list >> [email protected] >> https://lists.sourceforge.net/lists/listinfo/virtuoso-users >> >> > > > -- > > Regards, > > Kingsley Idehen President & CEO OpenLink Software Web: > http://www.openlinksw.com > Weblog: http://www.openlinksw.com/blog/~kidehen > Twitter/Identi.ca: kidehen > > > > > -- ---------------------------------------------------------------- Pete DeVries Department of Entomology University of Wisconsin - Madison 445 Russell Laboratories 1630 Linden Drive Madison, WI 53706 GeoSpecies Knowledge Base About the GeoSpecies Knowledge Base ------------------------------------------------------------
