Hi Stuart & all, I wanted to briefly mention that also sounds similar to this bug ticket about crawlers getting "stuck" in facets of entities: https://github.com/DSpace/dspace-angular/issues/2709
There's a fix we've applied which will be in the 8.0 and 7.6.2 releases (once each is finished): https://github.com/DSpace/dspace-angular/pull/2710 (This approach has been approved by Google Scholar) This may not be the same thing that Stuart noticed, but it's definitely related. So, this is another way to lessen the crawler activity if you are seeing it in your DSpace 7 instance. Tim On Wednesday, May 1, 2024 at 3:56:18 PM UTC-5 [email protected] wrote: > In the last couple of weeks we've had an issue with web crawlers getting > lost in facets, crawling literally millions of URLs in the faceted solr > index. This is mainly a problem because some of them get quite expensive in > terms of solr search (CPU and memory consumption of the solr component > rises). > > We've deployed the following fix: > > #added to redirect long solr queries back to the homepage > RewriteEngine On > RewriteCond "%{QUERY_STRING}" "filter_3" > RewriteRule . https://ir.wgtn.ac.nz/ [R] > > The "filter_3' means that users and crawlers are allowed two facets deep > before being redirected back to the homepage. > > We're redirecting to our own homepage; others will probably want to > redirect to their own homepages (and/or bot tarpits). > > cheers > stuart > > > > -- All messages to this mailing list should adhere to the Code of Conduct: https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx --- You received this message because you are subscribed to the Google Groups "DSpace Community" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/dspace-community/bd9d62d7-8d94-44c7-8a67-7ce38378f47fn%40googlegroups.com.
