Hi Peter, all,

On 13/03/15 07:35, Peter Dietz wrote:
> ES is equally guilty of being a statistics data source, by storing 
> original/raw. So, statistics is something that complicates DSpace's 
> role in preserving assets, since stats are a value-add, and not a core 
> repository function. But, since repo managers enjoy statistics, we 
> can't not offer statistics. I would however like to offload the role 
> of stats to a third party, such as Google Analytics though.

I mentioned the new GA integration / pushing bitstream downloads to GA 
functionality to my repository managers. Some of them are still quite 
concerned since their repositories have stats going back 5+ years. They 
were not happy with losing historical stats data (even keeping in mind 
how inaccurate it probably is).

> Back to the relevant discussion. Both SOLR and ES prefer to be just 
> indexes, something that you could rebuild if necessary. If you have 
> all dspace.log's you potentially could rebuild, but its very 
> laborsome. I've considered having an alternative log file, 
> logs/usage-stats.<date>.log, that was similar to the output of 
> stats-log-exporter|convertor, and input of stats-log-importer. Thus, 
> that would be the source of record, and the stats engines could 
> rebuild from this. Currently more information is being stored in the 
> stats engines than gets logged to dspace.log (useragent, hostname, ...).
>
> I've added the ability for SOLR to export its data to csv: 
> https://github.com/DSpace/DSpace/commit/f57619d726c07535ce786a3f79e9c39d56fd9031
> So, potentially, one could run that regularly to have backup data 
> points...

That's a good start, but your code only stores some of the data, similar 
to what is in the dspace.log files (actually, less than that, since your 
code discards information about the currently logged in user -- not that 
this is necessarily bad since this isn't shown in the current stats 
interface). Is that because this is the format expected by the legacy 
stats loader? If so, perhaps both of those could be improved to not 
discard information? Which still leaves the issue of someone wanting to 
switch between ElasticSearch and Solr stats without data loss, if these 
two store different information.

cheers,
Andrea

-- 
Dr Andrea Schweer
IRR Technical Specialist, ITS Information Systems
The University of Waikato, Hamilton, New Zealand


------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Dspace-devel mailing list
Dspace-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-devel

Reply via email to