Hi Marijika, We experienced the same problem with bot traffic. It was so intense that our server was not able to handle it. We decided to use a commercial service of our university IT office called Big-IP : https://www.f5.com/products/big-ip-services/advanced-firewall-manager
Basicaly it handles all requests targeted to our DSpace server before it is sent to it. It required a few weeks of fine tuning (during the first days even the API calls from the angular UI to the backend were blocked). It solves the bot traffic problems, but as you pointed out, there are still performance issues in DSpace that will need to be tackled. Thanks! Best, PIerre On Monday, September 23, 2024 at 8:06:20 AM UTC-4 Marijka Azzopardi wrote: > Hi all, > > I'd like to ask about your experience in *managing repository server > performance and stability issues, *particularly as a result of high-bot > traffic to your DSpace repository. > > To provide some background, at UNSW Sydney, our DSpace7 repository has > been experiencing an increase in performance and stability issues due to > the heavy load being placed on our repository from several contributing > factors, including increased bot and crawler traffic. > > We plan to upgrade from DSpace v7.0 to v7.6.2 to optimise our server > performance, and access vital bug fixes and functionality that addresses > this, such as Caching of server-side rendered pages > <https://wiki.lyrasis.org/display/DSDOC7x/Performance+Tuning+DSpace#PerformanceTuningDSpace-Turnon(orincrease)cachingofServer-SideRenderedpages>. > > I am aware though that performance and scalability issues are still being > reported by DSpace v7.6.2 and v8 repository owners > <https://github.com/DSpace/dspace-angular/issues/3110> and, as a result, > solutions > are being pushed for prioritisation in a future DSpace 9 release (tentative > Apr 2025) > <https://wiki.lyrasis.org/display/DSPACE/DSpace+Release+9.0+Status#:~:text=Improving%20performance%20/%20scalability,angular/issues/3163> > . > > In the interim, although this may have reduced impact, we’re looking to > update our robots.txt “disallow” rules to prevent bot crawling of > unnecessary repository pages and reduce the number of requests made to our > server by directing ‘compliant’ search engine crawlers directly to > repository metadata and files. > > I would be very interested to know how your institution may be managing > server performance and stability issues, if you have updated your > robots.txt to direct crawler traffic and block bots, or have implemented > any other solutions e.g. integration with third-party software such as > Redis <https://redis.io/> (used by Jagiellonian University > <https://github.com/DSpace/dspace-angular/issues/3110#:~:text=There%20is%20the%20solution%20we%20used%20to%20replace%20Server%2DSide%20rendered%20pages%20with%20Redis%20in%20the%20case%20study%20on%20the%20Jagiellonian%20University%20Repository%3A%0AReplace%20the%20build%2Din%20Server%2DSide%20Rendered%20pages%20with%20Redis.pdf>) > > to cache server-side rendered pages in DSpace, or Cloudflare > <https://www.cloudflare.com/en-au/> (Content Delivery Network), etc. > > Looking forward to hearing from you and a thanks in advance for your time! > > > > Kind regards, > > *Marijka Azzopardi* > > Repository Librarian > > Scholarly Communications & Repositories, UNSW Library > UNSW SYDNEY 2052 > -- All messages to this mailing list should adhere to the Code of Conduct: https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx --- You received this message because you are subscribed to the Google Groups "DSpace Community" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/dspace-community/49048dcc-362f-4b0f-befb-05cbab5f2c8fn%40googlegroups.com.
