Hi Peter, all, On 13/03/15 07:35, Peter Dietz wrote: > ES is equally guilty of being a statistics data source, by storing > original/raw. So, statistics is something that complicates DSpace's > role in preserving assets, since stats are a value-add, and not a core > repository function. But, since repo managers enjoy statistics, we > can't not offer statistics. I would however like to offload the role > of stats to a third party, such as Google Analytics though.
I mentioned the new GA integration / pushing bitstream downloads to GA functionality to my repository managers. Some of them are still quite concerned since their repositories have stats going back 5+ years. They were not happy with losing historical stats data (even keeping in mind how inaccurate it probably is). > Back to the relevant discussion. Both SOLR and ES prefer to be just > indexes, something that you could rebuild if necessary. If you have > all dspace.log's you potentially could rebuild, but its very > laborsome. I've considered having an alternative log file, > logs/usage-stats.<date>.log, that was similar to the output of > stats-log-exporter|convertor, and input of stats-log-importer. Thus, > that would be the source of record, and the stats engines could > rebuild from this. Currently more information is being stored in the > stats engines than gets logged to dspace.log (useragent, hostname, ...). > > I've added the ability for SOLR to export its data to csv: > https://github.com/DSpace/DSpace/commit/f57619d726c07535ce786a3f79e9c39d56fd9031 > So, potentially, one could run that regularly to have backup data > points... That's a good start, but your code only stores some of the data, similar to what is in the dspace.log files (actually, less than that, since your code discards information about the currently logged in user -- not that this is necessarily bad since this isn't shown in the current stats interface). Is that because this is the format expected by the legacy stats loader? If so, perhaps both of those could be improved to not discard information? Which still leaves the issue of someone wanting to switch between ElasticSearch and Solr stats without data loss, if these two store different information. cheers, Andrea -- Dr Andrea Schweer IRR Technical Specialist, ITS Information Systems The University of Waikato, Hamilton, New Zealand ------------------------------------------------------------------------------ Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ _______________________________________________ Dspace-devel mailing list Dspace-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-devel