On Tue, 2008-03-18 at 13:14 +0000, Simon Brown wrote: > On 13 Mar 2008, at 21:34, Richard Rodgers wrote: > > > On Thu, 2008-03-13 at 16:23 +0000, Simon Brown wrote: > >> I'm still curious about the necessity of the cache, as our removing > >> it > >> had no noticeable impact on performance and in fact increased the > >> responsiveness of the site when we did before-and-after tests with > >> Siege. > >> > > Most of these questions/observations stem from your assumption that > > the > > cache is there for performance, which it is not (except incidentally). > > The primary purpose is (roughly) transactional integrity - ensuring > > that > > there is only a single copy of an Item, etc, in play before the > > Context > > commits. In this light, it makes sense that there might be a slight > > performance penalty. > > So there are instances where, in the course of a single HTTP request, > multiple instances of the same Item may (in the absence of the cache) > be instantiated, modified, and committed to the database? Could you > give me an example of when this could happen? Clearly, as we're > currently running without the cache, we're in danger of having this > happen, and I'd like to be able to evaluate this risk. Not that I know of, but I'm not sure that is the only question to ask in evaluating risk.
The cache has to do with guaranteeing the safety of the DSpace programming API, which at its most basic is: (1) get a context (2) do some work (3) commit all work as an atomic transaction (4) free the context Note that the work need not be confined to a single HTTP request (that only came up because we were discussing spidering, where that happens to be true) - a context can have an arbitrarily long life, and involve an unlimited number of database reads, updates, etc. Thus it would be easy to write application code like: Read object A (as part of iterating through a collection) Modify it, update database ....a lot of other operations Read Object A (in another iteration) Modify it, update database ... other operations commit Without the cache, the first set of modifications would be lost. Now we certainly could guard against this by vetting all the application logic looking for problems, but the cache provides cheap (but not free, as your Siege profiling shows) insurance against it. >From a risk mitigation standpoint, I'd say as long as you have a very stable and well-understood system, risk should be fairly low - note, however, that I haven't done an exhaustive analysis. But DSpace is moving into a more modular world in which non-core (= independently developed) code will constitute an increasingly part of its functionality. In such a world API safeguards like the cache look increasingly good, in the sense of justifying their performance price. You raise some very interesting questions, and I don't want to convey the impression that the DSpace architecture is 'fully baked' in this area: one suggestion with merit I've heard proposed (by Rob Tansley) is to segregate Contexts into 'read-only' and 'writable'. The former could then utilize a shared cache much like the one that you first imagined the context cache was. I think you will see continued work on this as we move to 2.0. > > From this I'm also assuming that Browse.indexAll() won't do anything > to the database until the context commits after the call is done, > which for big repositories would be another way for this method to use > up a fair amount of heap. Correct. The browse system underwent a number of changes moving from 1.4 through 1.5, and I'm not conversant with it now, but in 1.4, I see no cache flushing. If you are running into difficulty, insert commit()s and item.decache()s to get a lower-profile heap. > Thanks, Richard R > -- > Simon Brown <[EMAIL PROTECTED]> - Cambridge University Computing Service > +44 1223 3 34714 - New Museums Site, Pembroke Street, Cambridge CB2 3QH > > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > DSpace-tech mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/dspace-tech ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ DSpace-tech mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dspace-tech

