Thank you, Mark. For now I'll just settle for an updated list of spider agents from COUNTER-Robots¹ (dropping the text file into dspace/config/spiders/agents seems to work).
Regards, ¹ https://github.com/atmire/COUNTER-Robots On Tue, Nov 5, 2019 at 4:02 PM Mark H. Wood <[email protected]> wrote: > On Mon, Nov 04, 2019 at 11:10:25PM +0200, Alan Orth wrote: > > The DSpace 5.x (and presumably 6.x) documentation[0] suggests that it is > > possible to mark existing Solr statistics records as being bots or > spiders > > using the following command: > > > > $ dspace stats-util -m > > > > After trying to test this with an updated list of user agents[1] for a > > while I realized that the feature is only implemented for IPs. As it > stands > > right now the code in StatisticsClient.java only marks robots based on > > their IPs, but not on their user agents or domains: > > > > else if (line.hasOption('m')) > > { > > SolrLogger.markRobotsByIP(); > > } > > > > Strangely enough, SolrLogger has a markRobotByUserAgent() function that > is > > never called anywhere in the Java code base (also it seems to only be > > partially implemented, as it does not iterate over agents). > > > > Should I file a bug? This issue affects DSpace 5.x and 6.x for sure. > > https://jira.duraspace.org/browse/DS-2431 > > There are several Issues related to completing the work on extended > spider marking and filtering. > > -- > Mark H. Wood > Lead Technology Analyst > > University Library > Indiana University - Purdue University Indianapolis > 755 W. Michigan Street > Indianapolis, IN 46202 > 317-274-0749 > www.ulib.iupui.edu > > -- > All messages to this mailing list should adhere to the DuraSpace Code of > Conduct: https://duraspace.org/about/policies/code-of-conduct/ > --- > You received this message because you are subscribed to the Google Groups > "DSpace Technical Support" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/dspace-tech/20191105140039.GA30402%40IUPUI.Edu > . > -- Alan Orth [email protected] https://picturingjordan.com https://englishbulgaria.net https://mjanja.ch "In heaven all the interesting people are missing." ―Friedrich Nietzsche -- All messages to this mailing list should adhere to the DuraSpace Code of Conduct: https://duraspace.org/about/policies/code-of-conduct/ --- You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/dspace-tech/CAKKdN4Uf43qw8WeX_6yrK25-qo%2BJ3QRF80w05f%3DggtWvCdoiKw%40mail.gmail.com.
