Hi Peter,
First of all, thanks for the answers....
We've already worked with google analytics, and it seems pretty
well...but you can't control how the statistics are done, so it's not an
option to us.
On the other hand, we've already study the ElasticSearch option, but is
made over Lucene, so the search methods and ram comsuption are
similar...We've are studying the "Sphinx" option, which is much more
efficient and faster than solr....
Finally, I'll give a try to the "Addons" 1.6.2 option which I thought
that wasn't updated to new versions of DSpace.
Regards,
Jesús
On 10/18/2011 12:15 AM, Peter Dietz wrote:
Hi Jesús,
We've run into SOLR statistics performance problems as well. You've
posted that you have a very large solr index, and unfortunately solr
performance degrade's as the index grows. We don't allow
non-administrator's to view statistics for a collection/community/item
on production because it slows the system down too much. However, when
we need to provide a report, we copy the SOLR index to another
computer, such as your workstation, and view the statistics locally. A
local computer with a lot of memory will run solr fine, however a busy
server does not also run SOLR that well.
If you want to be able to present reports on your production system,
I'm thinking the only thing you can throw at the problem is resources.
Perhaps adding an additional server just to host SOLR, similar to how
you might have an additional server just to host mySQL or postgresql.
My co-worker and I were wondering about the idea of switching out the
dspace-stats implementation with a different engine, such as removing
solr, and using something beefier such as ElasticSearch, however we
haven't implemented anything.
As has been mentioned by some others. You might be able to figure out
how to get Google Analytics to track all of the hits to your items,
communities, collections, bitstreams. In such case, you could then
query Google Analytics API for this information.
Finally, something to "anonymize" the solr statistics information
would be a good thing. We currently have IP address for every visitor
to every resource for every single request. Assuming we had a good
grip on robots, I think we could aggregate this to just record the
number of hits to a given resource per hour. After aggregating, and
pruning, you might end up with a much smaller solr database. Instead
of tens of millions, perhaps just hundreds of thousands of records. I
think one should consult the COUNTER project before altering your
statistics though.
Peter Dietz
2011/10/17 Richard Rodgers <[email protected] <mailto:[email protected]>>
Hi Jesús:
A lot of statistics work has been done for DSpace over time, but
each project focuses on different sets of requirements:
does the data need to appear in the UI, does it offer real-time
availability (just to name two of the strengths of the SOLR-based
system)?
One example of an alternative is
https://wiki.duraspace.org/display/DSPACE/StatisticsAddOn, though
I don't know if this has been
maintained against versions newer than DSpace 1.6.2
We run an entirely off-line, monthly reporting system using a
database designed to accommodate a set of internal administrative
requirements - where statistics are delivered as a spreadsheet
- , but that might
not fulfill your requirements.
The tech list archives and the wiki are a good place to start, but
you could also post to the list what your use case(s) are, and see
if any existing
work better meets your needs.
Hope this helps,
Richard R
On Oct 17, 2011, at 6:00 AM, Jesús Martín García wrote:
Hi!
I've been wondering if there is some kind of alternative to solr
statistics, due to the high load of ram to our system (514
millions of
records) which it's not easy to scale and it's very very slow.
So...Has
someone done some work on an alternative?
Thanks in advance,
Regards,
Jesús
--
.......................................................................
__
/ / Jesús Martín García
C E / S / C A Tècnic de Projectes
/__ / Centre de Serveis Científics i Acadèmics de Catalunya
Gran Capità, 2-4 (Edifici Nexus) · 08034 Barcelona
T. 93 551 6213 · F. 93 205 6979 · [email protected]
<mailto:[email protected]>
.......................................................................
------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure
contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and
makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2d-oct
_______________________________________________
DSpace-tech mailing list
[email protected]
<mailto:[email protected]>
https://lists.sourceforge.net/lists/listinfo/dspace-tech
------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure
contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and
makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2d-oct
_______________________________________________
DSpace-tech mailing list
[email protected]
<mailto:[email protected]>
https://lists.sourceforge.net/lists/listinfo/dspace-tech
------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2d-oct
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech
--
.......................................................................
__
/ / Jesús Martín García
C E / S / C A Tècnic de Projectes
/__ / Centre de Serveis Científics i Acadèmics de Catalunya
Gran Capità, 2-4 (Edifici Nexus) · 08034 Barcelona
T. 93 551 6213 · F. 93 205 6979 · [email protected]
.......................................................................
------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2d-oct
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech