Hi Stuart & all,

I wanted to briefly mention that also sounds similar to this bug ticket 
about crawlers getting "stuck" in facets of 
entities: https://github.com/DSpace/dspace-angular/issues/2709

There's a fix we've applied which will be in the 8.0 and 7.6.2 releases 
(once each is 
finished): https://github.com/DSpace/dspace-angular/pull/2710  (This 
approach has been approved by Google Scholar)

This may not be the same thing that Stuart noticed, but it's definitely 
related.  So, this is another way to lessen the crawler activity if you are 
seeing it in your DSpace 7 instance.

Tim

On Wednesday, May 1, 2024 at 3:56:18 PM UTC-5 [email protected] wrote:

> In the last couple of weeks we've had an issue with web crawlers getting 
> lost in facets, crawling literally millions of URLs in the faceted solr 
> index. This is mainly a problem because some of them get quite expensive in 
> terms of solr search (CPU and memory consumption of the solr component 
> rises).
>
> We've deployed the following fix:
>
>         #added to redirect long solr queries back to the homepage
>         RewriteEngine On
>         RewriteCond "%{QUERY_STRING}" "filter_3"
>         RewriteRule .  https://ir.wgtn.ac.nz/ [R]
>
> The "filter_3' means that users and crawlers are allowed two facets deep 
> before being redirected back to the homepage.
>
> We're redirecting to our own homepage; others will probably want to 
> redirect to their own homepages (and/or bot tarpits).
>
> cheers
> stuart
>
>
>
>

-- 
All messages to this mailing list should adhere to the Code of Conduct: 
https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx
--- 
You received this message because you are subscribed to the Google Groups 
"DSpace Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/dspace-community/bd9d62d7-8d94-44c7-8a67-7ce38378f47fn%40googlegroups.com.

Reply via email to