Hi all,

I have news with this issue. After some tests, I decided to work with Luke to review the content of Lucene indexes in each version (before and after upgrade) and, surprise, the indexes are the same. I've enabled the solr context through http and started to test different queries to obtain the same results from DSpace, and reviewing solr logs I saw that the query is being done now is:

solr/statistics/select?q=type:+4+AND++id:4&facet.limit=10&facet.field=city&fq=-isBot:true&fq=-(bundleName:[*+TO+*]-bundleName:ORIGINAL)&fq=-(statistics_type:[*+TO+*]+AND+-statistics_type:view)&rows=0

The query which is being done in the old DSpace is:

fq=-isBot:true&q=type:2+AND+owningComm:4&version=2.2

Therefore the problem is with the kind of queries that generates this statistics, not in the indexes. In fact, if I put the old query into new DSpace Solr interface the number of results are the same in the two environments.

My qüestion is, anyone knows if Solr queries were been modified, and if they were modified, which part of code was changed?

Thanks for your help


On 07/06/15 02:42, Jozef Misutka wrote:
Dear all,

because of our visit statistics requirements we have moved from DSpace statistics (DS) to Google Analytics (GA) and then to Piwik. I will describe our main reasons in more detail below because statistics are important for us and I think that our requirements are not that uncommon.

First of all, DS module has the advantage of knowing the internals of DSpace which it can exploit for creating reports and we use some of them. But in the general case, it cannot compete with projects focusing solely on statistics like Piwik/GA because those give you interactive user interface and a lot more (see http://piwik.org/features/). DS stores the most important information (https://wiki.duraspace.org/display/DSDOC3x/DSpace+Statistics#DSpaceStatistics-Commonstoredfieldsforallusageevents) but you often need even more... You cannot answer basic questions like "is my site being used from mobile devices". In fact, you cannot answer any questions related to statistics in DS in a simple way other than what is hardcoded in DSpace.

Basic Requirements:
* we are funded by grants and we need to answer very specific questions like number of downloads outside of the country where the repository is hosted and quite a few more;
* our users have to do the same;
* (subscribable) detailed monthly visit statistics accessible by DSpace users;
* appealing visualisation;
* consistent project wide statistics (where repository is only one part);
* able to inspect user behaviour.

Why use GA/Piwik instead of DS
* lack of notion for unique pageviews, definable visits, user profiles, entry/exit pages...; * missing information e.g. what kind of devices were used to access our site;
* missing query user interface;
* missing visualisation and RE;
* DS are a local solution which cannot be used to track other websites/services; * tracking different things separately (we store statistics separately for a) downloads; b) REST like usage e.g., OAI-PMH; c) and everything) [1]

Why use Piwik instead of GA
* limited free plan - can get exceeded when you allow to subscribe to automated statistic summaries;
* lack of raw data, some queries cumbersome;
* outside project cookie concerns;
* backup of statistics.

Because we think that the Piwik integration can be helpful we are in the process of creating a PR to DSpace but it will take a few more weeks.

Best,
Jozef

[1] Piwik detects downloads using javascript which might not get always triggered.

On 5 June 2015 at 08:48, Bram Luyten <b...@atmire.com <mailto:b...@atmire.com>> wrote:

    Hi Hilton,

    what are you currently lacking in DSpace stats, that make them
    less trustworthy than Piwik?

    Does Piwik only rely on client-side javascript or do you also have
    methods to rig it to register file downloads?

    I'm a huge fan of Google Analytics (AND the DSpace stats) but my
    enthusiasm recently got tempered after seeing these new ways how
    client side javascript can be abused for fake traffic and spam:
    
https://moz.com/blog/how-to-stop-spam-bots-from-ruining-your-analytics-referral-data

    The core advantages for DSpace stats, in my view, are:

    - You have and own all of the logged data, in its full detail (in
    your SOLR core). This is a major advantage over Google Analytics,
    which won't give you the IPs where traffic originates
    - Logging doesn't depend on client side javascript, but looks at
    the attributes in the HTTP requests that could go to a page, or to
    a bitstream.

    The state of the art is that nobody has a perfect solution yet to
    ensure that reported traffic are real, human users. As such, it's
    always a good idea to look at multiple sources of data with a
    critical eye.

    DISCLOSURE: my views in this matter are not unbiased, as we sell
    an advanced user interface that sits on top of DSpace stats
    <http://atmire.com/website/?q=products/content-usage-analysis-module>.
    So we have all reasons to make DSpace stats as trustworthy as
    possible.

    cheers,

    Bram

-- logo
        *Bram Luyten*
    /2888 Loker Avenue East, Suite 315, Carlsbad, CA. 92010/
    /Esperantolaan 4, Heverlee 3001, Belgium/
    www.atmire.com
    
<http://atmire.com/website/?q=services&utm_source=emailfooter&utm_medium=email&utm_campaign=braml>



    On 4 June 2015 at 18:08, Hilton Gibson <hilton.gib...@gmail.com
    <mailto:hilton.gib...@gmail.com>> wrote:

        Hi Ruben,

        We also use Piwik to check.
        We added the Piwikl javascript to the page structure xsl file.
        We get much more detail - so we feel these are more trustworthy.

        Cheers

        hg


        *Hilton Gibson*
        Ubuntu Linux Systems Administrator
        Stellenbosch University Library
        http://staff.lib.sun.ac.za/~hgibson/docs/cv/cv.html
        <http://staff.lib.sun.ac.za/%7Ehgibson/docs/cv/cv.html>


        On 4 June 2015 at 17:53, Ruben <ruben.bo...@csuc.cat
        <mailto:ruben.bo...@csuc.cat>> wrote:

            Hi Hilton,

            Yes, the first thing that we thought is the filtering of
            bots had been improved and now the number of statistics
            has been reduced because of this(the theory that many of
            old visits are from bots) , but how to explain the big
            increase of visits in one of the communities?

            On 04/06/15 17:44, Hilton Gibson wrote:
            Hi Ruben

            Same thing happened to me:
            http://scholar.sun.ac.za/handle/10019.1/1/statistics
            I can only assume that bot filtering has improved?

            Cheers

            hg

            *Hilton Gibson*
            Ubuntu Linux Systems Administrator
            Stellenbosch University Library
            http://staff.lib.sun.ac.za/~hgibson/docs/cv/cv.html
            <http://staff.lib.sun.ac.za/%7Ehgibson/docs/cv/cv.html>


            On 4 June 2015 at 17:28, Ruben <ruben.bo...@csuc.cat
            <mailto:ruben.bo...@csuc.cat>> wrote:

                Hi Tim,

                I didn't do the second step, reindexing Solr Stats.
                Now I did it and
                although the stats looks more completely (appears
                city and country data)
                the number of visits still the same. Also it's
                curious in many cases the
                statistics seems completely different. For example,
                in one community the
                number of visits in December 2014 is 0 in the old
                DSpace, and after the
                upgrade appears 10 visits, changing the total visits
                from 1134 to 22620.
                The most curious thing is the appearance of Palo Alto
                in all of "Top
                cities views" lists, and before the upgrade didn't
                appear in any list.

                The only thing can I do is list the steps that I did
                and wait if anyone
                can review them:

                1.- Copy [dspace]/solr/statistics from old MV to the new.
                2.- wget
                
"http://search.maven.org/remotecontent?filepath=org/apache/lucene/lucene-core/3.5.0/lucene-core-3.5.0.jar";
                -O lucene-core-3.5.0.jar
                3.- java -cp lucene-core-3.5.0.jar
                org.apache.lucene.index.IndexUpgrader
                statistics/data/index/
                4.- Move previous upgraded statistics/data to
                [dspace-src]/dspace/solr/statistics/
                5.- mvn package
                6.- ant update
                7.- [dspace]/bin/dspace solr-reindex-statistics

                Thanks

                On 04/06/15 15:43, Tim Donohue wrote:
                > Hi Rubén,
                >
                > Have you reindexed your Solr Stats after
                upgrading?  See step #13 in the
                > "Upgrading DSpace" documentation:
                >
                >
                https://wiki.duraspace.org/display/DSDOC5x/Upgrading+DSpace
                >
                > There are two steps to actually upgrading Solr
                Statistics.
                >
                > 1) The first part it to just upgrade the indexes
                (to the latest Solr
                > version). This part is performed automatically by
                default (but if the
                > automated version has errors, there are those
                "Manually Upgrading Solr
                > Indexes" instructions.
                >
                > 2) After the Solr indexes are in the latest
                version, you ALSO need to
                > perform an "in-place" reindex. This ensures that
                the structure (schema)
                > within your Solr index is updated to how DSpace now
                stores statistics.
                > If this is skipped, DSpace may not be able to
                "read" all the statistical
                > data successfully, and so it may only display a
                portion of the data that
                > is actually within your index.
                >
                > I hope that helps. But, let us know if you run into
                further issues. If
                > you have already performed both of these changes,
                I'd also encourage you
                > to check your DSpace & Tomcat logs to see if any
                errors are being reported.
                >
                > Tim
                >
                > On 6/4/2015 3:58 AM, Ruben wrote:
                >> Hi all,
                >>
                >> I'm working on dspace upgrade from 1.6 to 5.2, and
                one of sensible
                >> points is the migration of statistics. I've been
                reading about the
                >> necessity of upgrade the indexes to the last Solr
                version, and I've
                >> found this tutorial to do this:
                >>
                
https://wiki.duraspace.org/display/DSDOC5x/Upgrading+DSpace#UpgradingDSpace-ManuallyUpgradingSolrIndexes
                >>
                >> I've followed the steps of this guide and finally
                I got the statistics
                >> from 1.7 working on 5.2, but many of them are
                lost. For example, one of
                >> the communities have 317.110 views in the old
                DSpace and now this number
                >> is reduced to 100335, and all the convert process
                seemswell done.
                >> Someone have found the same issue with this
                process or knows a better
                >> tutorial to do this? I'm worried about this
                because this is one of the
                >> most important requirements, preserve the old
                statistics.
                >>
                >> Thanks in advance for your help,
                >>
                >> Rubén
                >>
                
<https://wiki.duraspace.org/display/DSDOC5x/Upgrading+DSpace#UpgradingDSpace-ManuallyUpgradingSolrIndexes>
                >>
                >>
                >> --
                >>
                
........................................................................
                >>
                >> Rubén Boada
                >> Tècnic de Càlcul i Aplicacions
                >> Consorci de Serveis Universitaris de Catalunya (CSUC)
                >>
                >> Gran Capità, 2 (Edifici Nexus).08034 Barcelona
                >> T.93 551 62 13.ruben.bo...@csuc.cat
                <mailto:ruben.bo...@csuc.cat>
                >> www.csuc.cat <http://www.csuc.cat>  .Twitter
                @CSUC_info.Facebook.Linkedin
                >> Subscriu-te al butlletí; (www.csuc.cat/butlleti
                <http://www.csuc.cat/butlleti>)
                >>
                
........................................................................
                >>
                >>
                >>
                >>
                
------------------------------------------------------------------------------
                >>
                >>
                >>
                >> _______________________________________________
                >> DSpace-tech mailing list
                >> DSpace-tech@lists.sourceforge.net
                <mailto:DSpace-tech@lists.sourceforge.net>
                >>
                https://lists.sourceforge.net/lists/listinfo/dspace-tech
                >> List Etiquette:
                https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette
                >>
                >
                
------------------------------------------------------------------------------
                > _______________________________________________
                > DSpace-tech mailing list
                > DSpace-tech@lists.sourceforge.net
                <mailto:DSpace-tech@lists.sourceforge.net>
                >
                https://lists.sourceforge.net/lists/listinfo/dspace-tech
                > List Etiquette:
                https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette


                --
                
........................................................................

                Rubén Boada
                Tècnic de Càlcul i Aplicacions
                Consorci de Serveis Universitaris de Catalunya (CSUC)

                Gran Capità, 2 (Edifici Nexus).08034 Barcelona
                T.93 551 62 13.ruben.bo...@csuc.cat
                <mailto:ruben.bo...@csuc.cat>
                www.csuc.cat <http://www.csuc.cat> .Twitter
                @CSUC_info.Facebook.Linkedin
                Subscriu-te al butlletí; (www.csuc.cat/butlleti
                <http://www.csuc.cat/butlleti>)
                
........................................................................


                
------------------------------------------------------------------------------
                _______________________________________________
                DSpace-tech mailing list
                DSpace-tech@lists.sourceforge.net
                <mailto:DSpace-tech@lists.sourceforge.net>
                https://lists.sourceforge.net/lists/listinfo/dspace-tech
                List Etiquette:
                https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette




-- ........................................................................

            Rubén Boada
            Tècnic de Càlcul i Aplicacions
            Consorci de Serveis Universitaris de Catalunya (CSUC)

            Gran Capità, 2 (Edifici Nexus).08034 Barcelona
            T.93 551 62 13.ruben.bo...@csuc.cat  <mailto:ruben.bo...@csuc.cat>
            www.csuc.cat  <http://www.csuc.cat>  .Twitter 
@CSUC_info.Facebook.Linkedin
            Subscriu-te al butlletí; (www.csuc.cat/butlleti  
<http://www.csuc.cat/butlleti>)
            
........................................................................



        
------------------------------------------------------------------------------

        _______________________________________________
        DSpace-tech mailing list
        DSpace-tech@lists.sourceforge.net
        <mailto:DSpace-tech@lists.sourceforge.net>
        https://lists.sourceforge.net/lists/listinfo/dspace-tech
        List Etiquette:
        https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette



    
------------------------------------------------------------------------------

    _______________________________________________
    DSpace-tech mailing list
    DSpace-tech@lists.sourceforge.net
    <mailto:DSpace-tech@lists.sourceforge.net>
    https://lists.sourceforge.net/lists/listinfo/dspace-tech
    List Etiquette:
    https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette




------------------------------------------------------------------------------


_______________________________________________
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech
List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette


--
........................................................................

Rubén Boada
Tècnic de Càlcul i Aplicacions
Consorci de Serveis Universitaris de Catalunya (CSUC)

Gran Capità, 2 (Edifici Nexus).08034 Barcelona
T.93 551 62 13.ruben.bo...@csuc.cat
www.csuc.cat .Twitter @CSUC_info.Facebook.Linkedin
Subscriu-te al butlletí; (www.csuc.cat/butlleti)
........................................................................

------------------------------------------------------------------------------
_______________________________________________
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech
List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette

Reply via email to