[ 
https://jira.duraspace.org/browse/DS-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=27692#comment-27692
 ] 

Andrea Schweer commented on DS-1482:
------------------------------------

I don't know how the demo site is set up, but in one of "my" repositories the 
items certainly don't all have the same date: 
http://researchcommons.waikato.ac.nz/sitemap?map=0
The sitemap job runs once a day via cron on that box. I added the link to 
robots.txt and I see the sitemap being requested by Googlebot and other 
crawlers.

I guess they really need content by the time it was last modified, don't they? 
I guess they'll want to re-crawl items after they've been edited. So Tim's 
first option sounds like a good idea to me, even though it's likely to be the 
one that involves more work...

Do we know what user agent the scholar crawlers use? Or do they piggyback onto 
Googlebot?
                
> Add a way for harvesters to find recently added items (request from Google)
> ---------------------------------------------------------------------------
>
>                 Key: DS-1482
>                 URL: https://jira.duraspace.org/browse/DS-1482
>             Project: DSpace
>          Issue Type: New Feature
>            Reporter: Tim Donohue
>
> This request came out of a discussion I had with Anurag Acharya and Darcy 
> Darpa at Google / Google Scholar.
> Anurag mentioned that often the Google harvesters seem to need to do a lot of 
> "paging / clicking" in order to find new items in a DSpace instance.  This 
> can cause both a performance hit in DSpace (as the crawler keeps requesting 
> pages), and also can result in delays where items may not appear in Google 
> for some time (if the crawler gives up or moves on before it ever finds the 
> item).
> Anurag mentioned that it'd be much easier (both on DSpace performance and on 
> the Google crawlers), if DSpace provided some way to easily locate recently 
> added items.  
> This could be something like a "Browse Recently Added Items" (i.e. browse by 
> dc.date.accessioned), or similar.  It was noted that EPrints has such a 
> feature (called "Latest Additions").  For example, see their demo site:
> http://demoprints.eprints.org/cgi/latest
> It's also worth noting this could just be as simple as adding a "More...." 
> Option to our existing "Recently Added" list (of 5 items), so that you can 
> see other recently added items.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

------------------------------------------------------------------------------
Free Next-Gen Firewall Hardware Offer
Buy your Sophos next-gen firewall before the end March 2013 
and get the hardware for free! Learn more.
http://p.sf.net/sfu/sophos-d2d-feb
_______________________________________________
Dspace-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-devel

Reply via email to