Several recent issues (DS-2337, DS-2487, and perhaps DS-2488) suggest
that we should step back and take a long look at how we are using the
Solr 'statistics' core.

Solr seems designed for use as a cache.  That's how the other cores
are used:  they can be refreshed from data in the database and the
assetstore.  But the statistics core is treated as durable storage, a
sink (perhaps the only one) for event data.  If you don't keep your
'dspace.log's forever, there may be NO WAY to recover statistical
records in the event of disaster or a schema change.  At the very
least it can require some fancy footwork if stat.s are to survive an
upgrade.

The Solr maintainers have basically said "don't do that":

  https://wiki.apache.org/solr/HowToReindex#Using_Solr_as_a_Data_Source

I think we need to give some more thought to how we can readily
preserve usage records over DSpace upgrades and system failures.

I should admit here that I am skeptical of using Solr as the
statistics store *at all*, however well it works most of the time.
But it is not my purpose in this note to advocate for something
different.

-- 
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu

Attachment: signature.asc
Description: Digital signature

------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Dspace-devel mailing list
Dspace-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-devel

Reply via email to