without getting into whether event streams should be logged to file or database, this is probably in general the way to go. though i would recommend that this is done on a broader scale so analysis tools are interoperable among the major repository software systems.
(there was some research on an XML log file format a while back but it did not go far) ttfn, ----hussein ===================================================================== hussein suleman ~ [EMAIL PROTECTED] ~ http://www.husseinsspace.com ===================================================================== Randy Stern wrote: > One useful distinction is to separate to some degree the statistics that we > may want to calculate from the events/raw data that needs to be recorded by > the DSpace system as it operates. As long as the events are recorded in the > database (preferably *not* logged in files), various computations, > aggregations, reports, and APIs for exposing that data can be generated > later. So we may want to focus initially on what data to record and plan > for a statistics data model, database tables, and recording to be built > into DSpace 2.0. > > At 09:46 AM 8/27/2008 -0400, Mark H. Wood wrote: >> On Tue, Aug 26, 2008 at 06:13:14PM -0500, Dorothea Salo wrote: >>> 2008/8/26 Mark H. Wood <[EMAIL PROTECTED]>: >> [snip] >>> This is such an interesting statement that I think I will make it next >>> week's topic! What *is* excellent document repository software? I have >>> a feeling that the non-developer community may have a rather different >>> take on it from most developers... we'll see if I'm right. >> I think you are, and I look forward to that discussion! >> >>>> This is one reason why I think that it should be as easy as possible >>>> for multiple stat. projects to tap into built-in streams of >>>> observations. Different sites have different needs, and I think we >>>> need to be able to easily play with various ways of doing stat.s. >>> Agreed, but just to toss this out: I foresee a countervailing pressure >>> in future toward standardized and aggregated statistics across >>> repositories. I have heard a number of statements to the effect that >>> faculty are using download counts from disciplinary repositories in >>> tenure-and-promotion packages. As their work becomes scattered and/or >>> duplicated across various repositories, they're going to want to >>> aggregate that information. >> Quite so. I just don't feel that we've yet got to the point at which >> we understand how to do that well. A lot of good solutions come about >> in this way: an abstract and somewhat indistinct common need is >> recognized; a number of people all go off in different directions and >> try things; solutions are compared, borrow from each other, coalesce; >> finally a now well-understood need finds itself fulfilled with one or >> two mature implementations. I feel that we're still deep in the "try >> things" phase. >> >> The degree to which statistics are desired and used suggests that, in >> addition to traditional reports, we should be thinking in terms of >> exposing statistical products in machine-readable form. I have been >> thinking for some time that we might, with reasonable effort, help to >> work out a lingua franca for exchanging usage statistics among >> repositories of various "brands" so that the utility of various ideas, >> and the behavior of repository users, might be studied more >> effectively. But again, what we can all agree on will very likely be >> a small subset of what we can individually envision. >> >> This really ought to be considered early-on, because if we can come up >> with a common theme in the abstract, then machine- and human-readable >> reporting become side-by-side layers on top of the pool of statistical >> data products, and both will be easier to think about if they are >> merely formatting something already produced. Likewise the production >> of those stat.s will be easier to think about if presentation issues >> can be separated from the task. >> >> I do *not* mean to say here that the statistics that people want now >> should have to wait indefinitely on some Grand Scheme to do it all. >> It would be better to organize the development in successive >> approximations if it looks like taking too long to do it all in one >> push. It's probably going to take several years to fully realize >> satisfactory monitoring and reporting of DSpace usage, but that >> doesn't mean that we can't provide better and better approximations >> much sooner. >> >> -- >> Mark H. Wood, Lead System Programmer [EMAIL PROTECTED] >> Typically when a software vendor says that a product is "intuitive" he >> means the exact opposite. >> >> >> _______________________________________________ >> Dspace-general mailing list >> [email protected] >> http://mailman.mit.edu/mailman/listinfo/dspace-general > > > Randy Stern > Manager of Systems Development > Harvard University Library Office for Information Systems > 90 Mount Auburn Street > Cambridge, MA 02138 > Tel. +1 (617) 495-3724 > Email <[EMAIL PROTECTED]> > > > _______________________________________________ > Dspace-general mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/dspace-general _______________________________________________ Dspace-general mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/dspace-general
