Hi Anja,

One idea I have is that with solr, for performance reasons, we have an
auto-commit process where UsageEvents don't write/commit/persist into SOLR
until the commit gets triggered, so they live only in memory until
triggered to write.

...so... If these periods had a higher than normal, or perhaps even normal
occurrence of tomcat restarts, then perhaps pending documents are never
written, thus lost, upon restart.

Perhaps in the servlet container shutdown process, we could add something
to have it signal for dspace/solr to write/save/flush/persist the documents
before shutdown.

Off the top of my head I don't recall how I've written to the elastic
search API, but I'm assuming I never made these auto-commit / bulk / batch
submit changes since I never encountered performance issues with elastic
search. I'm guessing one UsageEvent equals one commit to Elastic Search, so
no data loss on shutdown.

This is just my guess of what could be happening. I suppose there could be
other explanations too, such as corrupt solr index, but I would guess that
would lose a greater amount of data. Another guess would be a server
migration that didn't sync all data properly... An unguarded solr index
that a mischievous user did a delete query... It's possibly possible that
solr and elastic search dspace-stats could have slightly different robot
rule processing (unlikely), so if your usage baseline was entirely robots,
then GoogleBot taking a few days off from crawling you could cause a
valley...

Stats is tricky, part of me wishes I just leveraged Google analytics for
everything, just to have one less system to manage. However I do like the
flexibility when you build it yourself.
On Apr 11, 2014 9:54 AM, "Anja Le Blanc" <[email protected]>
wrote:

> Hello All,
>
> (We are running on DSpace 1.8.2)
>
> I was looking at our stats data for the last year and a half and I
> noticed periodical drops in views/downloads which are inconsistent with
> the overall usage pattern. (I did not filter out bots for that
> exercise.) Numbers dropped for 1 to 5 days to below 10 and even to 0
> sometimes (from an average of about 5000 per day). I counted about 8
> such events since Jan 2013. (There are possibly more which don't stand
> out as much.) Our DSpace was always running and being monitored during
> that period.
>
> In our set-up we record stats in both Solr and ElasticSearch (at least
> we have done for the last half year). The data for ElasticSearch do not
> show drops for the days where Solr has data gaps. ElsaticSearch stats
> recording is triggered by the same DSpace events as Solr is.
>
> Unfortunately we have not kept log files for the periods with Solr data
> gaps.
>
> Has anyone else seen unexpected fluctuations in their stats?
> Anyone any idea of what could cause it. DSpace and Solr were running at
> the time since there are some data just not enough.
>
> To look at the data I use for views
>
> http://localhost:8080/solr/statistics/select/?q=type+%3A+2+&version=2.2&start=0&rows=0&indent=on&facet=true&facet.range=time&f.time.facet.range.start=2013-01-01T00:00:00Z&f.time.facet.range.gap=%2B1DAY&f.time.facet.range.end=2014-04-11T00:00:00Z
>
>
> downloads
>
> http://localhost:8080/solr/statistics/select/?q=type+%3A+0+&version=2.2&start=0&rows=0&indent=on&facet=true&facet.range=time&f.time.facet.range.start=2013-01-01T00:00:00Z&f.time.facet.range.gap=%2B1DAY&f.time.facet.range.end=2014-04-11T00:00:00Z
>
> Interestingly we can prove that there were more events.
>
> Any comments welcome :-)
>
> Best regards,
> Anja
>
>
> ------------------------------------------------------------------------------
> Put Bad Developers to Shame
> Dominate Development with Jenkins Continuous Integration
> Continuously Automate Build, Test & Deployment
> Start a new project now. Try Jenkins in the cloud.
> http://p.sf.net/sfu/13600_Cloudbees
> _______________________________________________
> DSpace-tech mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dspace-tech
> List Etiquette:
> https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette
>
------------------------------------------------------------------------------
Put Bad Developers to Shame
Dominate Development with Jenkins Continuous Integration
Continuously Automate Build, Test & Deployment 
Start a new project now. Try Jenkins in the cloud.
http://p.sf.net/sfu/13600_Cloudbees
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech
List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette

Reply via email to