Hi all,

I hope the DSpace community can help with this one.
Since our upgrade to 4.1 in January this year (from 1.8.2, JSPUI with 
PostGreSQL on RHEL 6) we have noticed some anomalies coming into our usage 
statistics noticeably in the monthly reports.
Both bitstream views and item views have virtually doubled for no reason.
As well as this the item views list report shows something strange and I’ve 
made a comparison of the Dec 2014 top 10 items views with those of Jan and Feb 
15 (both are post 4.1 upgrade). As you can see in Jan and Feb there are some 
strange groupings that are not reflected in the “individual item statistics” 
pages. Note there is a large number from  the “370 to 380” range of items 
represented.
 
<item view comparison>
Item view for Dec 12
item.2123/12257              4836
item.2123/8878 1282
item.2123/6835 720
item.2123/8977 554
item.2123/11883              340
item.2123/12235              270
item.2123/9793 240
item.2123/11882              212
item.2123/673   200
item.2123/894   200
 
 
Item view for Jan 15
item.2123/12257              3150
item.2123/383   1114
item.2123/377   1080
item.2123/379   1070
item.2123/380   1068
item.2123/381   1068
item.2123/386   1060
item.2123/376   1056
item.2123/653   1054
item.2123/858   1046
 
 
Item view for Jan 15  item.2123/12257              4836
item.2123/8878 1282
item.2123/6835 720
item.2123/8977 554
item.2123/11883              340
item.2123/12235              270
item.2123/9793 240
item.2123/11882              212
item.2123/673   200
item.2123/894   200
 
Item view for Feb 15
 
item.2123/12257              1284
item.2123/379   1124
item.2123/380   1122
item.2123/376   1086
item.2123/653   1084
item.2123/858   1084
item.2123/383   1080
item.2123/378   1078
item.2123/385   1076
item.2123/381   1074
</ item view comparison>
We noticed that there has been an increase in Searches Performed; browses and  
browse_by_item as well  we noticed new things  in Number of searches i.e.
null         646,721
scope=null          366,071
dspace  292,496
content                292,495
scope=org           292,495
community@2ca              35,540
community@311              31,794
community@2ad             16,262
 
We are not sure if these are dspace internal indexing being counted and how to 
filter if this is the case. When we try to filter spiders we lose our 
individual item statistics.
 
 
What we have tried with the dspace statistics to correct the outcomes is the 
following:
 
# Changed  value in solr-statistics.cfg to filter out spiders
query.filter.spiderIp = true
 
However, this causes the individual item statistics to disappear altogether 
(perhaps due to the query string becoming too long?) So, we reset that value 
back to its default (i.e. false).
 
Next thing we tried, was to remove records that were bots from the statistics:
 
# Ran solr stats utilities with the following options
stats-util -m # mark as isBot
stats-util -i # remove by spider IP
stats-util -f # remove by isBot flag
stats-util -o # optimize
 
index-discovery -f # force re-index
index-discovery -b # rebuild
index-discovery -c # cleanup
index-discovery -o # optimize
 
# Then ran the normal sequence of overnight processing, which essentially is 
this:
stat-general
stat-monthly
stat-report-general
stat-report-monthly
stats-util -i
stats-util -o
index-discovery
index-discovery -o
 
All this resulted in no change on the outcome of the monthly reports.
 
Next we tried to switch to using Lucene instead of solr
 
# Changed values in dspace.cfg to enable lucene search and indexing
ItemCountDAO.class = org.dspace.browse.ItemCountDAOPostgres
browseDAO.class = org.dspace.browse.BrowseDAOPostgres
browseCreateDAO.class = org.dspace.browse.BrowseCreateDAOPostgres
 
# Ran lucene stats utilities with the following options
index-lucene-init -f # full build
index-lucene-init -r # reindex
 
These configuration changes and run of utilities, had no effect on the monthly 
reports either.
We would be grateful for any suggestions as to the reason for the anomalies in 
the monthly usage statistics and how to filter DSpace indexing activity.

Thanks a lot,
Gary
 
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech
List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette

Reply via email to