On 4 Sep 2008, at 16:46, Tom De Mulder wrote: > On Tue, 2 Sep 2008, Dorothea Salo wrote: > >> Repository managers: If any of this rings a bell with you, I need you >> to stand up and say so publicly. "The lurkers support me in email" >> (see >> <http://www.collectableboard.com/forums/books/44988-hoppys-poisoned-sanctimony.html >> >> >) >> is no more going to get these problems solved in future than it has >> in >> the past. > > While I'm not a repository manager, I've looked after a big DSpace > instance for over 5 years, and I've worn that hat. I agree with > Dorothea. > > We have serious issues with scalability and stability, and the fact > that > the existing codebase is very hard to modify (but I'll leave it to my > colleague who has to do the development to elaborate).
That would be me. I cannot speak to the 1.5 codebase but from what I've seen of it so far I don't think there have been many sweeping changes, so most of this probably applies. I refer specifically to 1.4.2. It's a BigBallOfMud. The boundaries between architectural layers vary from blurred to nonexistent - for example, there is SQL code scattered throughout the codebase, rather than down in an database access layer where it should be. This has several unpleasant effects, the first of which is that if you plan on running on a database other than Postgres or Oracle, you have to hunt down every single piece of SQL throughout the entire codebase and add another "else if" to it. Better hope you get them all. Supposing that you do, and you want to release your additional database support as a patch to assist the community at large, you've got a monster patch touching a large number of files in the codebase rather than one or two additional classes whose presence won't affect anyone who doesn't use them. That's not good design. It also manifests itself in other ways. Patching the system for properly darkening items is, as Dorothea has already noted, fraught with potential failures. We have a dark items patch which hides items from browse, RSS, and OAI-PMH, and we *think* we've caught everything, but as the only way to do it in the codebase as it stands is - once again - hunt down every instance of access to items and patch in an authz check, we're still not completely certain. We patched OAI-PMH in something of a hurry not long ago when we realised metadata was leaking through it. This kind of access control should, once again, be applied at a very low level - any calls to get lists of items for browsing etc. should include the user access context and shouldn't even return items the user should not be able to see. This kind of thing is difficult enough to implement on a well-defined architecture and an unholy nightmare on a bad one. Now, in a way, I can understand why fixing these things hasn't been high on anyone's list - my institution has things it would rather I do with our system in the same way that anyone else's does. What I'm less sure of is why a better architecture (which would benefit everyone who works with the codebase and therefore, indirectly, everyone else who uses DSpace) hasn't been more of a priority for the federation. I don't really want to address what makes an institutional repository good or bad because it's really not my area of expertise; I do feel that addressing the quality of the code itself will make it much easier for everyone who uses DSpace to bend it towards their particular needs. Regards, -- Simon Brown <[EMAIL PROTECTED]> - Cambridge University Computing Service +44 1223 3 34714 - New Museums Site, Pembroke Street, Cambridge CB2 3QH _______________________________________________ Dspace-general mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/dspace-general
