Related notes: 1) Our fix is for DSpace 6.3, sorry I should have said this in the first email. 2) Some of the issues we're seeing appear to be from non-google crawlers (based on reverse IP lookup and user agent string analysis) 3) Our fix works for web crawlers which do not follow robots.txt.
cheers stuart -- ...let us be heard from red core to black sky On Thu, 2 May 2024 at 09:13, DSpace Community < [email protected]> wrote: > Hi Stuart & all, > > I wanted to briefly mention that also sounds similar to this bug ticket > about crawlers getting "stuck" in facets of entities: > https://github.com/DSpace/dspace-angular/issues/2709 > > There's a fix we've applied which will be in the 8.0 and 7.6.2 releases > (once each is finished): > https://github.com/DSpace/dspace-angular/pull/2710 (This approach has > been approved by Google Scholar) > > This may not be the same thing that Stuart noticed, but it's definitely > related. So, this is another way to lessen the crawler activity if you are > seeing it in your DSpace 7 instance. > > Tim > > On Wednesday, May 1, 2024 at 3:56:18 PM UTC-5 [email protected] wrote: > >> In the last couple of weeks we've had an issue with web crawlers getting >> lost in facets, crawling literally millions of URLs in the faceted solr >> index. This is mainly a problem because some of them get quite expensive in >> terms of solr search (CPU and memory consumption of the solr component >> rises). >> >> We've deployed the following fix: >> >> #added to redirect long solr queries back to the homepage >> RewriteEngine On >> RewriteCond "%{QUERY_STRING}" "filter_3" >> RewriteRule . https://ir.wgtn.ac.nz/ [R] >> >> The "filter_3' means that users and crawlers are allowed two facets deep >> before being redirected back to the homepage. >> >> We're redirecting to our own homepage; others will probably want to >> redirect to their own homepages (and/or bot tarpits). >> >> cheers >> stuart >> >> >> >> -- > All messages to this mailing list should adhere to the Code of Conduct: > https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx > --- > You received this message because you are subscribed to a topic in the > Google Groups "DSpace Community" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/dspace-community/1-J8xg1ZrF8/unsubscribe > . > To unsubscribe from this group and all its topics, send an email to > [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/dspace-community/bd9d62d7-8d94-44c7-8a67-7ce38378f47fn%40googlegroups.com > <https://groups.google.com/d/msgid/dspace-community/bd9d62d7-8d94-44c7-8a67-7ce38378f47fn%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- All messages to this mailing list should adhere to the Code of Conduct: https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx --- You received this message because you are subscribed to the Google Groups "DSpace Community" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/dspace-community/CAC_Lu0a2YrwrmdLs%2BbxzDHU%2Bu_DQSLefBfJvB6nUYZZgTbfObg%40mail.gmail.com.
