Hi all,
I have news with this issue. After some tests, I decided to work with
Luke to review the content of Lucene indexes in each version (before and
after upgrade) and, surprise, the indexes are the same. I've enabled the
solr context through http and started to test different queries to
obtain the same results from DSpace, and reviewing solr logs I saw that
the query is being done now is:
solr/statistics/select?q=type:+4+AND++id:4&facet.limit=10&facet.field=city&fq=-isBot:true&fq=-(bundleName:[*+TO+*]-bundleName:ORIGINAL)&fq=-(statistics_type:[*+TO+*]+AND+-statistics_type:view)&rows=0
The query which is being done in the old DSpace is:
fq=-isBot:true&q=type:2+AND+owningComm:4&version=2.2
Therefore the problem is with the kind of queries that generates this
statistics, not in the indexes. In fact, if I put the old query into new
DSpace Solr interface the number of results are the same in the two
environments.
My qüestion is, anyone knows if Solr queries were been modified, and if
they were modified, which part of code was changed?
Thanks for your help
On 07/06/15 02:42, Jozef Misutka wrote:
Dear all,
because of our visit statistics requirements we have moved from DSpace
statistics (DS) to Google Analytics (GA) and then to Piwik.
I will describe our main reasons in more detail below because
statistics are important for us and I think that our requirements are
not that uncommon.
First of all, DS module has the advantage of knowing the internals of
DSpace which it can exploit for creating reports and we use some of
them. But in the general case, it cannot compete with projects
focusing solely on statistics like Piwik/GA because those give you
interactive user interface and a lot more (see
http://piwik.org/features/). DS stores the most important information
(https://wiki.duraspace.org/display/DSDOC3x/DSpace+Statistics#DSpaceStatistics-Commonstoredfieldsforallusageevents)
but you often need even more... You cannot answer basic questions like
"is my site being used from mobile devices". In fact, you cannot
answer any questions related to statistics in DS in a simple way other
than what is hardcoded in DSpace.
Basic Requirements:
* we are funded by grants and we need to answer very specific
questions like number of downloads outside of the country where the
repository is hosted and quite a few more;
* our users have to do the same;
* (subscribable) detailed monthly visit statistics accessible by
DSpace users;
* appealing visualisation;
* consistent project wide statistics (where repository is only one part);
* able to inspect user behaviour.
Why use GA/Piwik instead of DS
* lack of notion for unique pageviews, definable visits, user
profiles, entry/exit pages...;
* missing information e.g. what kind of devices were used to access
our site;
* missing query user interface;
* missing visualisation and RE;
* DS are a local solution which cannot be used to track other
websites/services;
* tracking different things separately (we store statistics separately
for a) downloads; b) REST like usage e.g., OAI-PMH; c) and everything) [1]
Why use Piwik instead of GA
* limited free plan - can get exceeded when you allow to subscribe to
automated statistic summaries;
* lack of raw data, some queries cumbersome;
* outside project cookie concerns;
* backup of statistics.
Because we think that the Piwik integration can be helpful we are in
the process of creating a PR to DSpace but it will take a few more weeks.
Best,
Jozef
[1] Piwik detects downloads using javascript which might not get
always triggered.
On 5 June 2015 at 08:48, Bram Luyten <b...@atmire.com
<mailto:b...@atmire.com>> wrote:
Hi Hilton,
what are you currently lacking in DSpace stats, that make them
less trustworthy than Piwik?
Does Piwik only rely on client-side javascript or do you also have
methods to rig it to register file downloads?
I'm a huge fan of Google Analytics (AND the DSpace stats) but my
enthusiasm recently got tempered after seeing these new ways how
client side javascript can be abused for fake traffic and spam:
https://moz.com/blog/how-to-stop-spam-bots-from-ruining-your-analytics-referral-data
The core advantages for DSpace stats, in my view, are:
- You have and own all of the logged data, in its full detail (in
your SOLR core). This is a major advantage over Google Analytics,
which won't give you the IPs where traffic originates
- Logging doesn't depend on client side javascript, but looks at
the attributes in the HTTP requests that could go to a page, or to
a bitstream.
The state of the art is that nobody has a perfect solution yet to
ensure that reported traffic are real, human users. As such, it's
always a good idea to look at multiple sources of data with a
critical eye.
DISCLOSURE: my views in this matter are not unbiased, as we sell
an advanced user interface that sits on top of DSpace stats
<http://atmire.com/website/?q=products/content-usage-analysis-module>.
So we have all reasons to make DSpace stats as trustworthy as
possible.
cheers,
Bram
--
logo
*Bram Luyten*
/2888 Loker Avenue East, Suite 315, Carlsbad, CA. 92010/
/Esperantolaan 4, Heverlee 3001, Belgium/
www.atmire.com
<http://atmire.com/website/?q=services&utm_source=emailfooter&utm_medium=email&utm_campaign=braml>
On 4 June 2015 at 18:08, Hilton Gibson <hilton.gib...@gmail.com
<mailto:hilton.gib...@gmail.com>> wrote:
Hi Ruben,
We also use Piwik to check.
We added the Piwikl javascript to the page structure xsl file.
We get much more detail - so we feel these are more trustworthy.
Cheers
hg
*Hilton Gibson*
Ubuntu Linux Systems Administrator
Stellenbosch University Library
http://staff.lib.sun.ac.za/~hgibson/docs/cv/cv.html
<http://staff.lib.sun.ac.za/%7Ehgibson/docs/cv/cv.html>
On 4 June 2015 at 17:53, Ruben <ruben.bo...@csuc.cat
<mailto:ruben.bo...@csuc.cat>> wrote:
Hi Hilton,
Yes, the first thing that we thought is the filtering of
bots had been improved and now the number of statistics
has been reduced because of this(the theory that many of
old visits are from bots) , but how to explain the big
increase of visits in one of the communities?
On 04/06/15 17:44, Hilton Gibson wrote:
Hi Ruben
Same thing happened to me:
http://scholar.sun.ac.za/handle/10019.1/1/statistics
I can only assume that bot filtering has improved?
Cheers
hg
*Hilton Gibson*
Ubuntu Linux Systems Administrator
Stellenbosch University Library
http://staff.lib.sun.ac.za/~hgibson/docs/cv/cv.html
<http://staff.lib.sun.ac.za/%7Ehgibson/docs/cv/cv.html>
On 4 June 2015 at 17:28, Ruben <ruben.bo...@csuc.cat
<mailto:ruben.bo...@csuc.cat>> wrote:
Hi Tim,
I didn't do the second step, reindexing Solr Stats.
Now I did it and
although the stats looks more completely (appears
city and country data)
the number of visits still the same. Also it's
curious in many cases the
statistics seems completely different. For example,
in one community the
number of visits in December 2014 is 0 in the old
DSpace, and after the
upgrade appears 10 visits, changing the total visits
from 1134 to 22620.
The most curious thing is the appearance of Palo Alto
in all of "Top
cities views" lists, and before the upgrade didn't
appear in any list.
The only thing can I do is list the steps that I did
and wait if anyone
can review them:
1.- Copy [dspace]/solr/statistics from old MV to the new.
2.- wget
"http://search.maven.org/remotecontent?filepath=org/apache/lucene/lucene-core/3.5.0/lucene-core-3.5.0.jar"
-O lucene-core-3.5.0.jar
3.- java -cp lucene-core-3.5.0.jar
org.apache.lucene.index.IndexUpgrader
statistics/data/index/
4.- Move previous upgraded statistics/data to
[dspace-src]/dspace/solr/statistics/
5.- mvn package
6.- ant update
7.- [dspace]/bin/dspace solr-reindex-statistics
Thanks
On 04/06/15 15:43, Tim Donohue wrote:
> Hi Rubén,
>
> Have you reindexed your Solr Stats after
upgrading? See step #13 in the
> "Upgrading DSpace" documentation:
>
>
https://wiki.duraspace.org/display/DSDOC5x/Upgrading+DSpace
>
> There are two steps to actually upgrading Solr
Statistics.
>
> 1) The first part it to just upgrade the indexes
(to the latest Solr
> version). This part is performed automatically by
default (but if the
> automated version has errors, there are those
"Manually Upgrading Solr
> Indexes" instructions.
>
> 2) After the Solr indexes are in the latest
version, you ALSO need to
> perform an "in-place" reindex. This ensures that
the structure (schema)
> within your Solr index is updated to how DSpace now
stores statistics.
> If this is skipped, DSpace may not be able to
"read" all the statistical
> data successfully, and so it may only display a
portion of the data that
> is actually within your index.
>
> I hope that helps. But, let us know if you run into
further issues. If
> you have already performed both of these changes,
I'd also encourage you
> to check your DSpace & Tomcat logs to see if any
errors are being reported.
>
> Tim
>
> On 6/4/2015 3:58 AM, Ruben wrote:
>> Hi all,
>>
>> I'm working on dspace upgrade from 1.6 to 5.2, and
one of sensible
>> points is the migration of statistics. I've been
reading about the
>> necessity of upgrade the indexes to the last Solr
version, and I've
>> found this tutorial to do this:
>>
https://wiki.duraspace.org/display/DSDOC5x/Upgrading+DSpace#UpgradingDSpace-ManuallyUpgradingSolrIndexes
>>
>> I've followed the steps of this guide and finally
I got the statistics
>> from 1.7 working on 5.2, but many of them are
lost. For example, one of
>> the communities have 317.110 views in the old
DSpace and now this number
>> is reduced to 100335, and all the convert process
seemswell done.
>> Someone have found the same issue with this
process or knows a better
>> tutorial to do this? I'm worried about this
because this is one of the
>> most important requirements, preserve the old
statistics.
>>
>> Thanks in advance for your help,
>>
>> Rubén
>>
<https://wiki.duraspace.org/display/DSDOC5x/Upgrading+DSpace#UpgradingDSpace-ManuallyUpgradingSolrIndexes>
>>
>>
>> --
>>
........................................................................
>>
>> Rubén Boada
>> Tècnic de Càlcul i Aplicacions
>> Consorci de Serveis Universitaris de Catalunya (CSUC)
>>
>> Gran Capità, 2 (Edifici Nexus).08034 Barcelona
>> T.93 551 62 13.ruben.bo...@csuc.cat
<mailto:ruben.bo...@csuc.cat>
>> www.csuc.cat <http://www.csuc.cat> .Twitter
@CSUC_info.Facebook.Linkedin
>> Subscriu-te al butlletí; (www.csuc.cat/butlleti
<http://www.csuc.cat/butlleti>)
>>
........................................................................
>>
>>
>>
>>
------------------------------------------------------------------------------
>>
>>
>>
>> _______________________________________________
>> DSpace-tech mailing list
>> DSpace-tech@lists.sourceforge.net
<mailto:DSpace-tech@lists.sourceforge.net>
>>
https://lists.sourceforge.net/lists/listinfo/dspace-tech
>> List Etiquette:
https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette
>>
>
------------------------------------------------------------------------------
> _______________________________________________
> DSpace-tech mailing list
> DSpace-tech@lists.sourceforge.net
<mailto:DSpace-tech@lists.sourceforge.net>
>
https://lists.sourceforge.net/lists/listinfo/dspace-tech
> List Etiquette:
https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette
--
........................................................................
Rubén Boada
Tècnic de Càlcul i Aplicacions
Consorci de Serveis Universitaris de Catalunya (CSUC)
Gran Capità, 2 (Edifici Nexus).08034 Barcelona
T.93 551 62 13.ruben.bo...@csuc.cat
<mailto:ruben.bo...@csuc.cat>
www.csuc.cat <http://www.csuc.cat> .Twitter
@CSUC_info.Facebook.Linkedin
Subscriu-te al butlletí; (www.csuc.cat/butlleti
<http://www.csuc.cat/butlleti>)
........................................................................
------------------------------------------------------------------------------
_______________________________________________
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
<mailto:DSpace-tech@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/dspace-tech
List Etiquette:
https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette
--
........................................................................
Rubén Boada
Tècnic de Càlcul i Aplicacions
Consorci de Serveis Universitaris de Catalunya (CSUC)
Gran Capità, 2 (Edifici Nexus).08034 Barcelona
T.93 551 62 13.ruben.bo...@csuc.cat <mailto:ruben.bo...@csuc.cat>
www.csuc.cat <http://www.csuc.cat> .Twitter
@CSUC_info.Facebook.Linkedin
Subscriu-te al butlletí; (www.csuc.cat/butlleti
<http://www.csuc.cat/butlleti>)
........................................................................
------------------------------------------------------------------------------
_______________________________________________
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
<mailto:DSpace-tech@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/dspace-tech
List Etiquette:
https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette
------------------------------------------------------------------------------
_______________________________________________
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
<mailto:DSpace-tech@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/dspace-tech
List Etiquette:
https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette
------------------------------------------------------------------------------
_______________________________________________
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech
List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette
--
........................................................................
Rubén Boada
Tècnic de Càlcul i Aplicacions
Consorci de Serveis Universitaris de Catalunya (CSUC)
Gran Capità, 2 (Edifici Nexus).08034 Barcelona
T.93 551 62 13.ruben.bo...@csuc.cat
www.csuc.cat .Twitter @CSUC_info.Facebook.Linkedin
Subscriu-te al butlletí; (www.csuc.cat/butlleti)
........................................................................
------------------------------------------------------------------------------
_______________________________________________
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech
List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette