The taxonconcept.org demo site has a sitemap file at

http://lod.taxonconcept.org/sitemap.xml

<http://lod.taxonconcept.org/sitemap.xml>The geospecies.org site has a
sitemap it two form

http://lod.geospecies.org/sitemap.gz

<http://lod.geospecies.org/sitemap.gz>and

http://lod.geospecies.org/sitemap.xml (big)

<http://lod.geospecies.org/sitemap.xml>This site does not have one yet

http://ocs.geospecies.org/

<http://ocs.geospecies.org/>FYI: The taxonconcept and ocs.geospecies are
currently just tests and the ontology etc will change.

I will look into crawling via the sitemap extension.

- Pete

On Thu, Mar 11, 2010 at 6:59 PM, Kingsley Idehen <[email protected]>wrote:

> Peter DeVries wrote:
>
>> I have installed the new version of Virtuoso Open Source and I seem to be
>> having the same crawling problem
>> with the following example (very preliminary) data sets:
>>
>> It seems to access the first RDF page and then stop.
>>
>> http://lod.taxonconcept.org/
>>
>> http://ocs.geospecies.org/
>>
>> The RDF starts at "index.rdf"  http://ocs.geospecies.org/index.rdf
>>
>> but even if I specify that URI, I get the same behavior.
>>
>
> Do you have sitemaps with Semantic Extensions now? If so, lets have the
> URLs.
>
> There is a new crawler that also allows you to designate crawl paths etc..
>
> Kingsley
>
>>
>> Suggestions?
>>
>> - Pete
>>
>>
>> On Tue, Dec 29, 2009 at 10:56 AM, Hugh Williams 
>> <[email protected]<mailto:
>> [email protected]>> wrote:
>>
>>    Hi Peter,
>>
>>    This issue with the Virtuoso Crawler has been recreated and is
>>    scheduled to be fixed for the next release. A short term
>>    workaround would be to query the Virtuoso SPARQL endpoint
>>    (/sparql) with the  "Retrieve remote RDF data for all missing
>>    source graphs"  option (get:soft pragma) set, for example:
>>
>>
>> http://demo.openlinksw.com/sparql?default-graph-uri=http%3A%2F%2Flod.geospecies.org%2Findex.rdf&should-sponge=soft&query=select+*+where+{%3Fs+%3Fp+%3Fo}&format=text%2Fhtml&debug=on<http://demo.openlinksw.com/sparql?default-graph-uri=http%3A%2F%2Flod.geospecies.org%2Findex.rdf&should-sponge=soft&query=select+*+where+%7B%3Fs+%3Fp+%3Fo%7D&format=text%2Fhtml&debug=on>
>>    <
>> http://demo.openlinksw.com/sparql?default-graph-uri=http%3A%2F%2Flod.geospecies.org%2Findex.rdf&should-sponge=soft&query=select+*+where+%7B%3Fs+%3Fp+%3Fo%7D&format=text%2Fhtml&debug=on
>> >
>>
>>
>>    Or use some of  other Virtuoso pragma options to tailor your query
>>    as required. Further details on IRI de-referencing and use of
>>    pragmas can be obtained from:
>>
>>    http://docs.openlinksw.com/virtuoso/rdfiridereferencing.html
>>
>>    Best Regards
>>    Hugh Williams
>>    Professional Services
>>    OpenLink Software
>>    Web: http://www.openlinksw.com
>>    Support: http://support.openlinksw.com
>>    Forums: http://boards.openlinksw.com/support
>>    Twitter: http://twitter.com/OpenLink
>>
>>    On 28 Dec 2009, at 18:16, Peter DeVries wrote:
>>
>>     Hi!
>>>
>>>    I have installed the latest Virtuoso open source and I am having
>>>    trouble getting it to crawl my data set.
>>>
>>>    The crawler downloads the first nine pages but then stops.
>>>
>>>    This happens when the target is http://lod.geospecies.org/ or
>>>    http://lod.geospecies.org/index.rdf.
>>>
>>>    The same dataset (rdf pages) can be successfully crawled with Elmo.
>>>
>>>    I think that this has something to do with a preference for
>>>    crawling the .xhtml pages rather than the rdf pages.
>>>
>>>    Is there something I should be including in the crawler input screen?
>>>
>>>
>>>    Thanks!
>>>
>>>    - Pete
>>>
>>>
>>>
>>>    ----------------------------------------------------------------
>>>    Pete DeVries
>>>    Department of Entomology
>>>    University of Wisconsin - Madison
>>>    445 Russell Laboratories
>>>    1630 Linden Drive
>>>    Madison, WI 53706
>>>    GeoSpecies Knowledge Base
>>>    About the GeoSpecies Knowledge Base
>>>    ------------------------------------------------------------
>>>
>>>  
>>> ------------------------------------------------------------------------------
>>>    This SF.Net <http://SF.Net> email is sponsored by the Verizon
>>>
>>>    Developer Community
>>>    Take advantage of Verizon's best-in-class app development support
>>>    A streamlined, 14 day to market process makes app distribution
>>>    fast and easy
>>>    Join now and get one step closer to millions of Verizon customers
>>>    http://p.sf.net/sfu/verizon-dev2dev
>>>    _______________________________________________
>>>    Virtuoso-users mailing list
>>>    [email protected]
>>>    <mailto:[email protected]>
>>>
>>>    https://lists.sourceforge.net/lists/listinfo/virtuoso-users
>>>
>>
>>
>>
>>
>> --
>> ----------------------------------------------------------------
>> Pete DeVries
>> Department of Entomology
>> University of Wisconsin - Madison
>> 445 Russell Laboratories
>> 1630 Linden Drive
>> Madison, WI 53706
>> GeoSpecies Knowledge Base
>> About the GeoSpecies Knowledge Base
>> ------------------------------------------------------------
>> ------------------------------------------------------------------------
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Download Intel&#174; Parallel Studio Eval
>> Try the new software tools for yourself. Speed compiling, find bugs
>> proactively, and fine-tune applications for parallel performance.
>> See why Intel Parallel Studio got high marks during beta.
>> http://p.sf.net/sfu/intel-sw-dev
>> ------------------------------------------------------------------------
>>
>>
>> _______________________________________________
>> Virtuoso-users mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/virtuoso-users
>>
>>
>
>
> --
>
> Regards,
>
> Kingsley Idehen       President & CEO OpenLink Software     Web:
> http://www.openlinksw.com
> Weblog: http://www.openlinksw.com/blog/~kidehen
> Twitter/Identi.ca: kidehen
>
>
>
>
>


-- 
----------------------------------------------------------------
Pete DeVries
Department of Entomology
University of Wisconsin - Madison
445 Russell Laboratories
1630 Linden Drive
Madison, WI 53706
GeoSpecies Knowledge Base
About the GeoSpecies Knowledge Base
------------------------------------------------------------

Reply via email to