Dear DSpace Community Statistics
1. From a what works perspective there is already beautiful statistics implementations addressing the minimum requirements, I think the IDEALS repository has what I would be very happy with, these guys seem to be one step ahead http://www.ideals.uiuc.edu I can remember asking Tim Donohue about their implementation a few years ago, he said it was a very customised solution, please correct me if wrong. I also find the eprints and Fez Fedora stats are pretty good. 2. Develop a package that delivers both via the JSP and XML Manakin interface. 3. Keep it fairly compartmentalised/simple? if possible and quarantine the requirements into 3 distinct areas a) Item Statistics - downloads with other additional extras like authors and collections b) Site Trends - traffic sources, countries etc piggy back on tools like Google Analytics, or other web analyser tools that Mark Wood mentions c) More complex reporting that meets a specific requirements. Many thanks for the opportunity to be part of the discussion, we are very isolated in New Zealand but struggling with all the same problems everyone else is experiencing... it helps to move forward. Time zones don't allow any online interaction it will be 4am here. Leonie Hayes Research Repository Librarian http://www.library.auckland.ac.nz/contacts/?firstname=&lastname=hayes http://researchspace.auckland.ac.nz -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED] Sent: Tuesday, 26 August 2008 4:03 a.m. To: [email protected] Subject: Dspace-general Digest, Vol 61, Issue 19 Send Dspace-general mailing list submissions to [email protected] To subscribe or unsubscribe via the World Wide Web, visit http://mailman.mit.edu/mailman/listinfo/dspace-general or, via email, send a message with subject or body 'help' to [EMAIL PROTECTED] You can reach the person managing the list at [EMAIL PROTECTED] When replying, please edit your Subject line so it is more specific than "Re: Contents of Dspace-general digest..." Today's Topics: 1. Week 2: Statistics (Dorothea Salo) 2. Re: Week 2: Statistics (Dorothea Salo) 3. Re: Week 2: Statistics (Mark H. Wood) ---------------------------------------------------------------------- Message: 1 Date: Mon, 25 Aug 2008 08:08:47 -0500 From: "Dorothea Salo" <[EMAIL PROTECTED]> Subject: [Dspace-general] Week 2: Statistics To: dspace <[email protected]>, "DSpace Tech-List" <[EMAIL PROTECTED]> Message-ID: <[EMAIL PROTECTED]> Content-Type: text/plain; charset=UTF-8 Greetings, DSpace community, I want to thank everyone once again for last week's stimulating discussion and impressive chat turnout! I have a new question for everyone this week, pursuant to some discussion on the lists: "Statistics" are one of the commonest requests for a new DSpace feature. Without further specification, however, it's hard to know what data to present, since there are no standards or even clear best practices in this area. What statistics do the following groups of DSpace users need to see, and in what form are the statistics best presented to them? Depositors End-users (defined as "people examining items and downloading bitstreams from a DSpace instance;" we may have to refine this further in discussion) DSpace repository managers (as distinct from systems administrators) What else should developers keep in mind as they implement this feature? Because it would be nice to reach a working consensus on this (unlike last week's question, which was intended to pull out as broad a selection of needs as possible), I think we should start discussing immediately. I encourage all respondents to respond TO THE MAILING LIST instead of to me. I will be holding another chat to discuss the weekly question. It will take place Wednesday 27 August in the DSpace IRC chatroom, #dspace on irc.freenode.net. I apologize to West Coast (USA) community members for last week's unconscionably early hour; we'll try 10 am US Central (11 am Eastern, 4 pm GMT) this week, and we may go even later next week if our European community members can stand it. For those who don't normally use IRC, there are two easy web gateways. One is mibbit.com; the other is specific to our channel and can be found at <http://dspace.testathon.net/cgi-bin/irc.cgi>. I encourage all of us to become familiar with the channel; it is a source of real-time technical information from DSpace developers, as well as a community in its own right. Dorothea -- Dorothea Salo [EMAIL PROTECTED] Digital Repository Librarian AIM: mindsatuw University of Wisconsin Rm 218, Memorial Library (608) 262-5493 ------------------------------ Message: 2 Date: Mon, 25 Aug 2008 09:07:43 -0500 From: "Dorothea Salo" <[EMAIL PROTECTED]> Subject: Re: [Dspace-general] Week 2: Statistics To: dspace <[email protected]> Message-ID: <[EMAIL PROTECTED]> Content-Type: text/plain; charset=UTF-8 My answers: > What statistics do the following groups of > DSpace users need to see, and in what form are the statistics best > presented to them? > > Depositors At a minimum, I would like depositors to see the number of times an item's splash page has been visited, and the number of times each content bitstream (as distinct from e.g. thumbnails) has been downloaded. I would also like aggregate statistics available for each author in the system, though I recognize that this creates authority-control and role-evaluation issues. (For example, if Dr. Helen Troia is the author of articles in the repository, the editor of a journal whose backfiles are in the repository, as well as a thesis advisor for some theses in the thesis collection, the journal and the theses should NOT count toward her downloads.) HTML items (and similar aggregates, once we can work with them; e.g. Flash objects) cause trouble for bitstream analysis. To cut through the jungle, I suggest that only the primary bitstream have its accesses counted. If possible, it would be nice to count accesses for all HTML bitstreams, but that can be lived without if need be. I don't believe these statistics need to be real-time; a daily or even weekly cron-job would suffice. I do believe we need to take into account when an item was ingested, recognizing that older items will pile up the downloads over time. In addition to total-aggregates, then, I would recommend "in the last week," "in the last month," and "in the last year/since ingest" information. Per-calendar-year information should be kept and displayed indefinitely, even if the underlying data are eventually purged, because authors will use this in tenure-and-promotion packages. A sense of delta would be nice as well -- depositors would LOVE to know if suddenly an item's downloads spike. Other desiderata, less important: broad-brush geographic information (country of origin? Google Maps mashup?) for accesses, per-collection and per-community access counts (because it NEVER hurts to get a sense of competition going), search terms (in DSpace itself or from search engines) that land people at a particular item. > End-users (defined as "people examining items and downloading > bitstreams from a DSpace instance;" we may have to refine this further > in discussion) I think end-users can usefully be shown the per-item and per-bitstream information discussed above. They don't need to see per-author information -- or at the very least, authors should be able to decide whether to make this information public. (We do NOT want to embarrass anyone; that's a serious turnoff for our potential depositors.) > DSpace repository managers (as distinct from systems administrators) I get survey after survey asking for activity information on the repository. I can't answer them. To do so, I need download information on the whole repository. (Current JSPUI statistics offer an approximation to this, but I'm very leery of trusting it; I don't understand how it's calculated, and the numbers seem incredibly off to me.) I am sometimes asked about growth rate in accesses, so it would be useful to break this down by year. Some algorithm for breaking it down by amount of content in the repository ("downloads-per-item," where "item" would have to be some kind of average of items-in-repository over the period examined) would be useful as well. (And yes, I absolutely loathe those surveys too, but when they come from ARL, I don't have the luxury of ignoring them.) Some "wow" numbers would be useful for marketing purposes. A lot of what I've already described would do the trick there. I would also like to be able to track deposits per collection/community over time; this helps me know where to focus marketing and collection-development efforts, as well as helping me report progress to the appropriate administrators. (I run a system-wide repository, so I have to track deposits by campus; each campus has its own community.) > What else should developers keep in mind as they implement this feature? Search-engine crawlers. Excluding them provides a much more realistic sense of interest. We need to make clear this is happening, though, or we will be at a perceived disadvantage relative to repositories that don't strip out these accesses. Dorothea -- Dorothea Salo [EMAIL PROTECTED] Digital Repository Librarian AIM: mindsatuw University of Wisconsin Rm 218, Memorial Library (608) 262-5493 ------------------------------ Message: 3 Date: Mon, 25 Aug 2008 10:55:20 -0400 From: "Mark H. Wood" <[EMAIL PROTECTED]> Subject: Re: [Dspace-general] Week 2: Statistics To: [email protected] Message-ID: <[EMAIL PROTECTED]> Content-Type: text/plain; charset="us-ascii" One thing to keep in mind about whole-site statistical tables is that there are already tools to do this for web sites in general, such as AWStats or Webalizer or whatever your favorite may be. We probably should not spend effort to try to duplicate those. Another consideration is that there are stat.s which would be useful anytime, and stat.s that you dream up once and may never use again, or may only find interesting at irregular intervals. So I think we should be careful not to try to do too much ourselves. We can have some generally-useful stuff built in, but we also need ways to expose the raw cases in a useful form for ad-hoc analysis with general-purpose statistical tools (SPSS/BMD/SAS/Stata/R/whatever). Stuff to be inserted as one component of e.g. an item page probably needs to be built in. Stuff that would be a page on its own should perhaps not be part of DSpace at all, but rather something we make easy to do with other tools. We need to keep clearly in mind the distinction between capturing raw cases (someone fetched a bitstream) and abstracting useful patterns from the collected cases (frequency histogram of this collection's fetches over time, last month's fetches broken down by nation of origin). What might be helpful is to provide some views or stored procedures that stat. tools could use to classify observations. Such tools usually have good facilities for poking around in databases, but could perhaps use help in getting the information they need without having to understand (and track changes to!) the fulness of DSpace's schema. -- Mark H. Wood, Lead System Programmer [EMAIL PROTECTED] Typically when a software vendor says that a product is "intuitive" he means the exact opposite. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: not available Url : http://mailman.mit.edu/pipermail/dspace-general/attachments/20080825/147 7891f/attachment-0001.bin ------------------------------ _______________________________________________ Dspace-general mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/dspace-general End of Dspace-general Digest, Vol 61, Issue 19 ********************************************** _______________________________________________ Dspace-general mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/dspace-general
