I have installed the new version of Virtuoso Open Source and I seem to be having the same crawling problem with the following example (very preliminary) data sets:
It seems to access the first RDF page and then stop. http://lod.taxonconcept.org/ <http://lod.taxonconcept.org/>http://ocs.geospecies.org/ <http://ocs.geospecies.org/>The RDF starts at "index.rdf" http://ocs.geospecies.org/index.rdf <http://ocs.geospecies.org/index.rdf>but even if I specify that URI, I get the same behavior. Suggestions? - Pete On Tue, Dec 29, 2009 at 10:56 AM, Hugh Williams <[email protected]>wrote: > Hi Peter, > > This issue with the Virtuoso Crawler has been recreated and is scheduled to > be fixed for the next release. A short term workaround would be to query the > Virtuoso SPARQL endpoint (/sparql) with the "Retrieve remote RDF data for > all missing source graphs" option (get:soft pragma) set, for example: > > > http://demo.openlinksw.com/sparql?default-graph-uri=http%3A%2F%2Flod.geospecies.org%2Findex.rdf&should-sponge=soft&query=select+*+where+{%3Fs+%3Fp+%3Fo}&format=text%2Fhtml&debug=on<http://demo.openlinksw.com/sparql?default-graph-uri=http%3A%2F%2Flod.geospecies.org%2Findex.rdf&should-sponge=soft&query=select+*+where+%7B%3Fs+%3Fp+%3Fo%7D&format=text%2Fhtml&debug=on> > > Or use some of other Virtuoso pragma options to tailor your query as > required. Further details on IRI de-referencing and use of pragmas can be > obtained from: > > http://docs.openlinksw.com/virtuoso/rdfiridereferencing.html > > Best Regards > Hugh Williams > Professional Services > OpenLink Software > Web: http://www.openlinksw.com > Support: http://support.openlinksw.com > Forums: http://boards.openlinksw.com/support > Twitter: http://twitter.com/OpenLink > > On 28 Dec 2009, at 18:16, Peter DeVries wrote: > > Hi! > > I have installed the latest Virtuoso open source and I am having trouble > getting it to crawl my data set. > > The crawler downloads the first nine pages but then stops. > > This happens when the target is http://lod.geospecies.org/ or > http://lod.geospecies.org/index.rdf. > > The same dataset (rdf pages) can be successfully crawled with Elmo. > > I think that this has something to do with a preference for crawling the > .xhtml pages rather than the rdf pages. > > Is there something I should be including in the crawler input screen? > > > Thanks! > > - Pete > > > > ---------------------------------------------------------------- > Pete DeVries > Department of Entomology > University of Wisconsin - Madison > 445 Russell Laboratories > 1630 Linden Drive > Madison, WI 53706 > GeoSpecies Knowledge Base > About the GeoSpecies Knowledge Base > ------------------------------------------------------------ > ------------------------------------------------------------------------------ > This SF.Net email is sponsored by the Verizon Developer Community > Take advantage of Verizon's best-in-class app development support > A streamlined, 14 day to market process makes app distribution fast and > easy > Join now and get one step closer to millions of Verizon customers > http://p.sf.net/sfu/verizon-dev2dev_______________________________________________ > Virtuoso-users mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/virtuoso-users > > > -- ---------------------------------------------------------------- Pete DeVries Department of Entomology University of Wisconsin - Madison 445 Russell Laboratories 1630 Linden Drive Madison, WI 53706 GeoSpecies Knowledge Base About the GeoSpecies Knowledge Base ------------------------------------------------------------
