This points out a problem that I think we (and many other contemporary projects) have all over the place: our application is expected to grow steadily and without limit, yet we assume over and over again that the problem is small and bounded.
There is no way around it: if your repository is large and busy, sooner or later you will be disappointed by the performance of ad-hoc queries no matter how many resources you throw at them. One answer to this is to depend less on ad-hoc queries. Do you have some "usual questions" to be answered over and over? Do you really need up-to-the-second answers? Would it be good enough to run periodic reports and accumulate them? Some other machine with SPSS or R or whatever can grind cases all night, if need be, and leave your monthly abstract waiting in your inbox the next day. (I want to find the time to extend DSpace to facilitate this.) If the periodic abstractions are saved in raw form before rendering, they become cheap inputs to longer-range reports. There are *far* more efficient methods than those presently provided for extracting information from vast quantities of data. Once periodic statistical products are available, they can be simply fetched over and over again and slotted into DSpace pages to provide tolerably up-to-date views of activity quickly and cheaply. We just don't do that yet. Once periodic statistical products are available, we don't have to keep twenty years of event data in Solr; we can purge old cases to dead storage and combine precalculated summaries with live statistics over only the latest events to keep the numbers fresh without having responsiveness suffer more and more over time. We just don't do that yet. Once we have a well-designed way to get cases out of DSpace for use with other tools, we can produce as many streams as we wish, selected any way that makes sense. We can cheaply provide custom-tailored data products to individual contributors and other consumers for their own analysis. We just don't do that yet. There's still an important place for ad-hoc query, but how often would something less expensive do just as well? ALL cases are historical; they're not going to change. We only need to recalculate when we change our view of the cases. -- Mark H. Wood, Lead System Programmer [email protected] Asking whether markets are efficient is like asking whether people are smart.
pgpz5JzmCYH3E.pgp
Description: PGP signature
------------------------------------------------------------------------------ All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct
_______________________________________________ DSpace-tech mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dspace-tech

