Related notes:

1) Our fix is for  DSpace 6.3, sorry I should have said this in the first
email.
2) Some of the issues we're seeing appear to be from non-google crawlers
(based on reverse IP lookup and user agent string analysis)
3) Our fix works for web crawlers which do not follow robots.txt.

cheers
stuart
--
...let us be heard from red core to black sky


On Thu, 2 May 2024 at 09:13, DSpace Community <
[email protected]> wrote:

> Hi Stuart & all,
>
> I wanted to briefly mention that also sounds similar to this bug ticket
> about crawlers getting "stuck" in facets of entities:
> https://github.com/DSpace/dspace-angular/issues/2709
>
> There's a fix we've applied which will be in the 8.0 and 7.6.2 releases
> (once each is finished):
> https://github.com/DSpace/dspace-angular/pull/2710  (This approach has
> been approved by Google Scholar)
>
> This may not be the same thing that Stuart noticed, but it's definitely
> related.  So, this is another way to lessen the crawler activity if you are
> seeing it in your DSpace 7 instance.
>
> Tim
>
> On Wednesday, May 1, 2024 at 3:56:18 PM UTC-5 [email protected] wrote:
>
>> In the last couple of weeks we've had an issue with web crawlers getting
>> lost in facets, crawling literally millions of URLs in the faceted solr
>> index. This is mainly a problem because some of them get quite expensive in
>> terms of solr search (CPU and memory consumption of the solr component
>> rises).
>>
>> We've deployed the following fix:
>>
>>         #added to redirect long solr queries back to the homepage
>>         RewriteEngine On
>>         RewriteCond "%{QUERY_STRING}" "filter_3"
>>         RewriteRule .  https://ir.wgtn.ac.nz/ [R]
>>
>> The "filter_3' means that users and crawlers are allowed two facets deep
>> before being redirected back to the homepage.
>>
>> We're redirecting to our own homepage; others will probably want to
>> redirect to their own homepages (and/or bot tarpits).
>>
>> cheers
>> stuart
>>
>>
>>
>> --
> All messages to this mailing list should adhere to the Code of Conduct:
> https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx
> ---
> You received this message because you are subscribed to a topic in the
> Google Groups "DSpace Community" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/dspace-community/1-J8xg1ZrF8/unsubscribe
> .
> To unsubscribe from this group and all its topics, send an email to
> [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/dspace-community/bd9d62d7-8d94-44c7-8a67-7ce38378f47fn%40googlegroups.com
> <https://groups.google.com/d/msgid/dspace-community/bd9d62d7-8d94-44c7-8a67-7ce38378f47fn%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
All messages to this mailing list should adhere to the Code of Conduct: 
https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx
--- 
You received this message because you are subscribed to the Google Groups 
"DSpace Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/dspace-community/CAC_Lu0a2YrwrmdLs%2BbxzDHU%2Bu_DQSLefBfJvB6nUYZZgTbfObg%40mail.gmail.com.

Reply via email to