Re: [Virtuoso-users] Virtuoso Crawler Stops Crawling after only a few pages

Peter DeVries Thu, 11 Mar 2010 20:41:10 +0000

I have installed the new version of Virtuoso Open Source and I seem to be
having the same crawling problem
with the following example (very preliminary) data sets:


It seems to access the first RDF page and then stop.

http://lod.taxonconcept.org/

<http://lod.taxonconcept.org/>http://ocs.geospecies.org/

<http://ocs.geospecies.org/>The RDF starts at "index.rdf"
http://ocs.geospecies.org/index.rdf

<http://ocs.geospecies.org/index.rdf>but even if I specify that URI, I get
the same behavior.

Suggestions?

- Pete

On Tue, Dec 29, 2009 at 10:56 AM, Hugh Williams <[email protected]>wrote:

> Hi Peter,
>
> This issue with the Virtuoso Crawler has been recreated and is scheduled to
> be fixed for the next release. A short term workaround would be to query the
> Virtuoso SPARQL endpoint (/sparql) with the  "Retrieve remote RDF data for
> all missing source graphs"  option (get:soft pragma) set, for example:
>
>
> http://demo.openlinksw.com/sparql?default-graph-uri=http%3A%2F%2Flod.geospecies.org%2Findex.rdf&should-sponge=soft&query=select+*+where+{%3Fs+%3Fp+%3Fo}&format=text%2Fhtml&debug=on<http://demo.openlinksw.com/sparql?default-graph-uri=http%3A%2F%2Flod.geospecies.org%2Findex.rdf&should-sponge=soft&query=select+*+where+%7B%3Fs+%3Fp+%3Fo%7D&format=text%2Fhtml&debug=on>
>
> Or use some of  other Virtuoso pragma options to tailor your query as
> required. Further details on IRI de-referencing and use of pragmas can be
> obtained from:
>
> http://docs.openlinksw.com/virtuoso/rdfiridereferencing.html
>
> Best Regards
> Hugh Williams
> Professional Services
> OpenLink Software
> Web: http://www.openlinksw.com
> Support: http://support.openlinksw.com
> Forums: http://boards.openlinksw.com/support
> Twitter: http://twitter.com/OpenLink
>
> On 28 Dec 2009, at 18:16, Peter DeVries wrote:
>
> Hi!
>
> I have installed the latest Virtuoso open source and I am having trouble
> getting it to crawl my data set.
>
> The crawler downloads the first nine pages but then stops.
>
> This happens when the target is http://lod.geospecies.org/ or
> http://lod.geospecies.org/index.rdf.
>
> The same dataset (rdf pages) can be successfully crawled with Elmo.
>
> I think that this has something to do with a preference for crawling the
> .xhtml pages rather than the rdf pages.
>
> Is there something I should be including in the crawler input screen?
>
>
> Thanks!
>
> - Pete
>
>
>
> ----------------------------------------------------------------
> Pete DeVries
> Department of Entomology
> University of Wisconsin - Madison
> 445 Russell Laboratories
> 1630 Linden Drive
> Madison, WI 53706
> GeoSpecies Knowledge Base
> About the GeoSpecies Knowledge Base
> ------------------------------------------------------------
> ------------------------------------------------------------------------------
> This SF.Net email is sponsored by the Verizon Developer Community
> Take advantage of Verizon's best-in-class app development support
> A streamlined, 14 day to market process makes app distribution fast and
> easy
> Join now and get one step closer to millions of Verizon customers
> http://p.sf.net/sfu/verizon-dev2dev_______________________________________________
> Virtuoso-users mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/virtuoso-users
>
>
>


-- 
----------------------------------------------------------------
Pete DeVries
Department of Entomology
University of Wisconsin - Madison
445 Russell Laboratories
1630 Linden Drive
Madison, WI 53706
GeoSpecies Knowledge Base
About the GeoSpecies Knowledge Base
------------------------------------------------------------

Re: [Virtuoso-users] Virtuoso Crawler Stops Crawling after only a few pages

Reply via email to