Hello,
I watched the discussion about Solr statistics and want to make a small
contribution.
We have the task to make DSpace statistics reports in the COUNTER standart.
We decided to use AWStat (http://awstats.sourceforge.net) and to develop
Perl add-on to AWstats for COUNTER compliance.
Has anyone already gone this route? Is there a rakes?
Regards
Fedor.
>Message: 4
>Date: Mon, 17 Oct 2011 18:15:56 -0400
>From: Peter Dietz <[email protected]>
>Subject: Re: [Dspace-tech] alternative to solr statistics
>To: Richard Rodgers <[email protected]>
>Cc: "[email protected]"
> <[email protected]>
>Message-ID:
> <caootv+o8on5obu+unajfjpxraerlcbfagtaq81hw+js4nga...@mail.gmail.com>
>Content-Type: text/plain; charset="utf-8"
>
>Hi Jes?s,
>
>We've run into SOLR statistics performance problems as well. You've posted
>that you have a very large solr index, and unfortunately solr performance
>degrade's as the index grows. We don't allow non-administrator's to view
>statistics for a collection/community/item on production because it slows
>the system down too much. However, when we need to provide a report, we
copy
>the SOLR index to another computer, such as your workstation, and view the
>statistics locally. A local computer with a lot of memory will run solr
>fine, however a busy server does not also run SOLR that well.
>
>If you want to be able to present reports on your production system, I'm
>thinking the only thing you can throw at the problem is resources. Perhaps
>adding an additional server just to host SOLR, similar to how you might
have
>an additional server just to host mySQL or postgresql. My co-worker and I
>were wondering about the idea of switching out the dspace-stats
>implementation with a different engine, such as removing solr, and using
>something beefier such as ElasticSearch, however we haven't implemented
>anything.
>
>As has been mentioned by some others. You might be able to figure out how
to
>get Google Analytics to track all of the hits to your items, communities,
>collections, bitstreams. In such case, you could then query Google
Analytics
>API for this information.
>
>Finally, something to "anonymize" the solr statistics information would be
a
>good thing. We currently have IP address for every visitor to every
resource
>for every single request. Assuming we had a good grip on robots, I think we
>could aggregate this to just record the number of hits to a given resource
>per hour. After aggregating, and pruning, you might end up with a much
>smaller solr database. Instead of tens of millions, perhaps just hundreds
of
>thousands of records. I think one should consult the COUNTER project before
>altering your statistics though.
>
>
>
>Peter Dietz
>
>
------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2d-oct
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech