Peter DeVries wrote:
I have installed the new version of Virtuoso Open Source and I seem to be having the same crawling problem
with the following example (very preliminary) data sets:

It seems to access the first RDF page and then stop.

http://lod.taxonconcept.org/

http://ocs.geospecies.org/

The RDF starts at "index.rdf"  http://ocs.geospecies.org/index.rdf

but even if I specify that URI, I get the same behavior.

Do you have sitemaps with Semantic Extensions now? If so, lets have the URLs.

There is a new crawler that also allows you to designate crawl paths etc..

Kingsley

Suggestions?

- Pete

On Tue, Dec 29, 2009 at 10:56 AM, Hugh Williams <[email protected] <mailto:[email protected]>> wrote:

    Hi Peter,

    This issue with the Virtuoso Crawler has been recreated and is
    scheduled to be fixed for the next release. A short term
    workaround would be to query the Virtuoso SPARQL endpoint
    (/sparql) with the  "Retrieve remote RDF data for all missing
    source graphs"  option (get:soft pragma) set, for example:

    
http://demo.openlinksw.com/sparql?default-graph-uri=http%3A%2F%2Flod.geospecies.org%2Findex.rdf&should-sponge=soft&query=select+*+where+{%3Fs+%3Fp+%3Fo}&format=text%2Fhtml&debug=on
    
<http://demo.openlinksw.com/sparql?default-graph-uri=http%3A%2F%2Flod.geospecies.org%2Findex.rdf&should-sponge=soft&query=select+*+where+%7B%3Fs+%3Fp+%3Fo%7D&format=text%2Fhtml&debug=on>

    Or use some of  other Virtuoso pragma options to tailor your query
    as required. Further details on IRI de-referencing and use of
    pragmas can be obtained from:

    http://docs.openlinksw.com/virtuoso/rdfiridereferencing.html

    Best Regards
    Hugh Williams
    Professional Services
    OpenLink Software
    Web: http://www.openlinksw.com
    Support: http://support.openlinksw.com
    Forums: http://boards.openlinksw.com/support
    Twitter: http://twitter.com/OpenLink

    On 28 Dec 2009, at 18:16, Peter DeVries wrote:

    Hi!

    I have installed the latest Virtuoso open source and I am having
    trouble getting it to crawl my data set.

    The crawler downloads the first nine pages but then stops.

    This happens when the target is http://lod.geospecies.org/ or
    http://lod.geospecies.org/index.rdf.

    The same dataset (rdf pages) can be successfully crawled with Elmo.

    I think that this has something to do with a preference for
    crawling the .xhtml pages rather than the rdf pages.

    Is there something I should be including in the crawler input screen?


    Thanks!

    - Pete



    ----------------------------------------------------------------
    Pete DeVries
    Department of Entomology
    University of Wisconsin - Madison
    445 Russell Laboratories
    1630 Linden Drive
    Madison, WI 53706
    GeoSpecies Knowledge Base
    About the GeoSpecies Knowledge Base
    ------------------------------------------------------------
    
------------------------------------------------------------------------------
    This SF.Net <http://SF.Net> email is sponsored by the Verizon
    Developer Community
    Take advantage of Verizon's best-in-class app development support
    A streamlined, 14 day to market process makes app distribution
    fast and easy
    Join now and get one step closer to millions of Verizon customers
    http://p.sf.net/sfu/verizon-dev2dev
    _______________________________________________
    Virtuoso-users mailing list
    [email protected]
    <mailto:[email protected]>
    https://lists.sourceforge.net/lists/listinfo/virtuoso-users




--
----------------------------------------------------------------
Pete DeVries
Department of Entomology
University of Wisconsin - Madison
445 Russell Laboratories
1630 Linden Drive
Madison, WI 53706
GeoSpecies Knowledge Base
About the GeoSpecies Knowledge Base
------------------------------------------------------------
------------------------------------------------------------------------

------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
------------------------------------------------------------------------

_______________________________________________
Virtuoso-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/virtuoso-users


--

Regards,

Kingsley Idehen President & CEO OpenLink Software Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen





Reply via email to