Hi Peter,

First of all, thanks for the answers....

We've already worked with google analytics, and it seems pretty well...but you can't control how the statistics are done, so it's not an option to us.

On the other hand, we've already study the ElasticSearch option, but is made over Lucene, so the search methods and ram comsuption are similar...We've are studying the "Sphinx" option, which is much more efficient and faster than solr....

Finally, I'll give a try to the "Addons" 1.6.2 option which I thought that wasn't updated to new versions of DSpace.

Regards,

Jesús

On 10/18/2011 12:15 AM, Peter Dietz wrote:
Hi Jesús,

We've run into SOLR statistics performance problems as well. You've posted that you have a very large solr index, and unfortunately solr performance degrade's as the index grows. We don't allow non-administrator's to view statistics for a collection/community/item on production because it slows the system down too much. However, when we need to provide a report, we copy the SOLR index to another computer, such as your workstation, and view the statistics locally. A local computer with a lot of memory will run solr fine, however a busy server does not also run SOLR that well.

If you want to be able to present reports on your production system, I'm thinking the only thing you can throw at the problem is resources. Perhaps adding an additional server just to host SOLR, similar to how you might have an additional server just to host mySQL or postgresql. My co-worker and I were wondering about the idea of switching out the dspace-stats implementation with a different engine, such as removing solr, and using something beefier such as ElasticSearch, however we haven't implemented anything.

As has been mentioned by some others. You might be able to figure out how to get Google Analytics to track all of the hits to your items, communities, collections, bitstreams. In such case, you could then query Google Analytics API for this information.

Finally, something to "anonymize" the solr statistics information would be a good thing. We currently have IP address for every visitor to every resource for every single request. Assuming we had a good grip on robots, I think we could aggregate this to just record the number of hits to a given resource per hour. After aggregating, and pruning, you might end up with a much smaller solr database. Instead of tens of millions, perhaps just hundreds of thousands of records. I think one should consult the COUNTER project before altering your statistics though.



Peter Dietz



2011/10/17 Richard Rodgers <[email protected] <mailto:[email protected]>>

    Hi Jesús:

    A lot of statistics work has been done for DSpace over time, but
    each project focuses on different sets of requirements:
    does the data need to appear in the UI, does it offer real-time
    availability (just to name two of the strengths of the SOLR-based
    system)?

    One example of an alternative is
    https://wiki.duraspace.org/display/DSPACE/StatisticsAddOn, though
    I don't know if this has been
    maintained against versions newer than DSpace 1.6.2

    We run an entirely off-line, monthly reporting system using a
    database designed to accommodate a set of internal administrative
    requirements  - where statistics are delivered as a spreadsheet
    - , but that might
    not fulfill your requirements.

    The tech list archives and the wiki are a good place to start, but
    you could also post to the list what your use case(s) are, and see
    if any existing
    work better meets your needs.

    Hope this helps,

    Richard R


    On Oct 17, 2011, at 6:00 AM, Jesús Martín García wrote:

    Hi!

    I've been wondering if there is some kind of alternative to solr
    statistics, due to the high load of ram to our system (514
    millions of
    records) which it's not easy to scale and it's very very slow.
    So...Has
    someone done some work on an alternative?

    Thanks in advance,

    Regards,

    Jesús

-- .......................................................................
          __
        /   /       Jesús Martín García
    C E / S / C A   Tècnic de Projectes
      /__ /         Centre de Serveis Científics i Acadèmics de Catalunya

    Gran Capità, 2-4 (Edifici Nexus) · 08034 Barcelona
    T. 93 551 6213 · F. 93 205 6979 · [email protected]
    <mailto:[email protected]>
    .......................................................................


    
------------------------------------------------------------------------------
    All the data continuously generated in your IT infrastructure
    contains a
    definitive record of customers, application performance, security
    threats, fraudulent activity and more. Splunk takes this data and
    makes
    sense of it. Business sense. IT sense. Common sense.
    http://p.sf.net/sfu/splunk-d2d-oct
    _______________________________________________
    DSpace-tech mailing list
    [email protected]
    <mailto:[email protected]>
    https://lists.sourceforge.net/lists/listinfo/dspace-tech


    
------------------------------------------------------------------------------
    All the data continuously generated in your IT infrastructure
    contains a
    definitive record of customers, application performance, security
    threats, fraudulent activity and more. Splunk takes this data and
    makes
    sense of it. Business sense. IT sense. Common sense.
    http://p.sf.net/sfu/splunk-d2d-oct
    _______________________________________________
    DSpace-tech mailing list
    [email protected]
    <mailto:[email protected]>
    https://lists.sourceforge.net/lists/listinfo/dspace-tech



------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2d-oct


_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech


--
.......................................................................
      __
    /   /       Jesús Martín García
C E / S / C A   Tècnic de Projectes
  /__ /         Centre de Serveis Científics i Acadèmics de Catalunya

Gran Capità, 2-4 (Edifici Nexus) · 08034 Barcelona
T. 93 551 6213 · F. 93 205 6979 · [email protected]
.......................................................................

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2d-oct
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech

Reply via email to