Hi Tint,
> Our repository is running on 1.6.2 and we have been using solr for a few
> months now. There seems to be some problem with solr statistics. Bitstream
> for some items were downloaded more than a few thousand times within a month
> from the same place. How can I filter out such systematic access (by
> bots/spiders etc)?
Take a look at the following tool:
/dspace/bin/dspace stats-util -h
usage: StatisticsClient
-b,--reindex-bitstreams Reindex the bitstreams to ensure we have
the bundle name
-r,--remove-deleted-bitstreams While indexing the bundle names remove
the statistics about deleted bitstreams
-u,--update-spider-files Update Spider IP Files from internet
into /dspace/config/spiders
-f,--delete-spiders-by-flag Delete Spiders in Solr By isBot Flag
-i,--delete-spiders-by-ip Delete Spiders in Solr By IP Address
-m,--mark-spiders Update isBot Flag in Solr
-h,--help help
-o,--optimize Run maintenance on the SOLR index
You might need to first register the IP address of the bots in
/dspace/config/spiders/
I hope that helps,
Stuart Lewis
Digital Development Manager
Te Tumu Herenga The University of Auckland Library
Auckland Mail Centre, Private Bag 92019, Auckland 1142, New Zealand
Ph: +64 (0)9 373 7599 x81928
------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2d-oct
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech