Hi Tint,

> Our repository is running on 1.6.2 and we have been using solr for a few 
> months now. There seems to be some problem with solr statistics. Bitstream 
> for some items were downloaded more than a few thousand times within a month 
> from the same place. How can I filter out such systematic access (by 
> bots/spiders etc)?

Take a look at the following tool:

/dspace/bin/dspace stats-util -h
usage: StatisticsClient
       
 -b,--reindex-bitstreams          Reindex the bitstreams to ensure we have
                                  the bundle name
 -r,--remove-deleted-bitstreams   While indexing the bundle names remove
                                  the statistics about deleted bitstreams
 -u,--update-spider-files         Update Spider IP Files from internet
                                  into /dspace/config/spiders
 -f,--delete-spiders-by-flag      Delete Spiders in Solr By isBot Flag
 -i,--delete-spiders-by-ip        Delete Spiders in Solr By IP Address
 -m,--mark-spiders                Update isBot Flag in Solr
 -h,--help                        help
 -o,--optimize                    Run maintenance on the SOLR index


You might need to first register the IP address of the bots in 
/dspace/config/spiders/

I hope that helps,


Stuart Lewis
Digital Development Manager
Te Tumu Herenga The University of Auckland Library
Auckland Mail Centre, Private Bag 92019, Auckland 1142, New Zealand
Ph: +64 (0)9 373 7599 x81928


------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2d-oct
_______________________________________________
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech

Reply via email to