Hi, Continuing the tangent:
We found it confusing when we compared item views for the top items in our repository: the numbers on the site-wide stats don't match the item-level stats. Immediately this reduced our confidence in both stats pages: what exactly is being counted and are these reliable sources to report on? After some digging, we were surprised to learn that: 1) DSpace generates and displays stats from two different sources, 2) what's termed "legacy" stats is actually the current source for site-level stats, 3) traffic from bots inflate stats in both Solr and log-based stats. These are not things that are obvious in the UI or the documentation. At our institution, the repository manager is a non-technical person that (quite reasonably) took the stats at face value. She did not expect to need a long explanation from the system administrator (myself) on how that stats actually work. Moving forward, it would be preferable if the same source were used for all stats displayed in the UI (site-wide and item-level). Further, the site-wide stats could be reviewed and brought up to date (log processing time? no thanks). Cheers, Anthony -----Original Message----- From: Mark H. Wood [mailto:[email protected]] Sent: Wednesday, August 19, 2015 8:55 AM To: [email protected] Subject: Re: [Dspace-tech] Administrative Statistics On Tue, Aug 18, 2015 at 02:12:18PM -0500, Tim Donohue wrote: > The "Administrative Statistics" are the (very old) legacy DSpace > statistics pulled from log files, which pre-dated the Usage Statistics > (based on Solr). They are only generated by running these commandline > options: > > [dspace]/bin/dspace stat-initial > [dspace]/bin/dspace stat-general > [dspace]/bin/dspace stat-monthly > > The only reason they still exist is that the Usage Statistics don't > provide all the same information (yet). The Usage Statistics are much > more accurate in providing usage information (as these legacy, > log-based stats do not filter out spiders or similar). But, the > legacy, log-based stats do provide some unique administrative > statistics, like the counts of the number of actions performed in your > DSpace, etc. [tangent] I don't find the situation confusing at all. Service administrators have different needs than contributors and editors. While the mechanism for gathering and storing sitewide admin. statistics might be improvable, I think we ought to look at bringing them up to date and fleshing them out with other information that admin.s would want. As an example, people interested in the content will appreciate having robot accesses filtered out, but admin.s might profit from seeing filtered and unfiltered counts side by side. Even more so if they can sample these counts mechanically and accumulate them for visualization. Some other stuff I get asked for includes simple counts of how much stuff we have: how many Items, how many Bitstreams, how many image Bitstreams. End users don't care about such things, but senior administrators do. For the future, another thing that our biggest statistical consumers want very much is views aggregated by *author*. I'm looking forward to first-class support for author identities so that we can do this well. [tired old refrain] "Statistics" doesn't mean the same thing to everyone. It may not mean the same thing to *anyone*. -- Mark H. Wood Lead Technology Analyst University Library Indiana University - Purdue University Indianapolis 755 W. Michigan Street Indianapolis, IN 46202 317-274-0749 www.ulib.iupui.edu ------------------------------------------------------------------------------ _______________________________________________ DSpace-tech mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dspace-tech List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette

