Hi all, I hope the DSpace community can help with this one. Since our upgrade to 4.1 in January this year (from 1.8.2, JSPUI with PostGreSQL on RHEL 6) we have noticed some anomalies coming into our usage statistics noticeably in the monthly reports. Both bitstream views and item views have virtually doubled for no reason. As well as this the item views list report shows something strange and I’ve made a comparison of the Dec 2014 top 10 items views with those of Jan and Feb 15 (both are post 4.1 upgrade). As you can see in Jan and Feb there are some strange groupings that are not reflected in the “individual item statistics” pages. Note there is a large number from the “370 to 380” range of items represented. <item view comparison> Item view for Dec 12 item.2123/12257 4836 item.2123/8878 1282 item.2123/6835 720 item.2123/8977 554 item.2123/11883 340 item.2123/12235 270 item.2123/9793 240 item.2123/11882 212 item.2123/673 200 item.2123/894 200 Item view for Jan 15 item.2123/12257 3150 item.2123/383 1114 item.2123/377 1080 item.2123/379 1070 item.2123/380 1068 item.2123/381 1068 item.2123/386 1060 item.2123/376 1056 item.2123/653 1054 item.2123/858 1046 Item view for Jan 15 item.2123/12257 4836 item.2123/8878 1282 item.2123/6835 720 item.2123/8977 554 item.2123/11883 340 item.2123/12235 270 item.2123/9793 240 item.2123/11882 212 item.2123/673 200 item.2123/894 200 Item view for Feb 15 item.2123/12257 1284 item.2123/379 1124 item.2123/380 1122 item.2123/376 1086 item.2123/653 1084 item.2123/858 1084 item.2123/383 1080 item.2123/378 1078 item.2123/385 1076 item.2123/381 1074 </ item view comparison> We noticed that there has been an increase in Searches Performed; browses and browse_by_item as well we noticed new things in Number of searches i.e. null 646,721 scope=null 366,071 dspace 292,496 content 292,495 scope=org 292,495 community@2ca 35,540 community@311 31,794 community@2ad 16,262 We are not sure if these are dspace internal indexing being counted and how to filter if this is the case. When we try to filter spiders we lose our individual item statistics. What we have tried with the dspace statistics to correct the outcomes is the following: # Changed value in solr-statistics.cfg to filter out spiders query.filter.spiderIp = true However, this causes the individual item statistics to disappear altogether (perhaps due to the query string becoming too long?) So, we reset that value back to its default (i.e. false). Next thing we tried, was to remove records that were bots from the statistics: # Ran solr stats utilities with the following options stats-util -m # mark as isBot stats-util -i # remove by spider IP stats-util -f # remove by isBot flag stats-util -o # optimize index-discovery -f # force re-index index-discovery -b # rebuild index-discovery -c # cleanup index-discovery -o # optimize # Then ran the normal sequence of overnight processing, which essentially is this: stat-general stat-monthly stat-report-general stat-report-monthly stats-util -i stats-util -o index-discovery index-discovery -o All this resulted in no change on the outcome of the monthly reports. Next we tried to switch to using Lucene instead of solr # Changed values in dspace.cfg to enable lucene search and indexing ItemCountDAO.class = org.dspace.browse.ItemCountDAOPostgres browseDAO.class = org.dspace.browse.BrowseDAOPostgres browseCreateDAO.class = org.dspace.browse.BrowseCreateDAOPostgres # Ran lucene stats utilities with the following options index-lucene-init -f # full build index-lucene-init -r # reindex These configuration changes and run of utilities, had no effect on the monthly reports either. We would be grateful for any suggestions as to the reason for the anomalies in the monthly usage statistics and how to filter DSpace indexing activity.
Thanks a lot, Gary ------------------------------------------------------------------------------ Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ _______________________________________________ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette