On 29 September 2010 14:17, Tom De Mulder <[email protected]> wrote:
> I know you like to talk down the problem, but that really isn't helping.
>
This isn't about talking down the problem - it's about finding where the
real problems are and not just patching the immediate concerns. And
considering the interests of nearly 1000 DSpace instances that are
registered on dspace.org - many of whom will probably be more worried about
rampant resource usage for small repositories from adding overhead to cover
up the problems of larger repositories.
> We run 5 DSpace instances, three of these are systems with hundreds of
> thousands of items, and it's dog slow and immensely resource-intensive. And
> yes, we want these to be single systems. Why shouldn't we?
>
Surely the more pertinent question is why wouldn't you want to be able to
run a multi-node solution? I'm sure I don't need to tell you that no matter
how good a job you do of making the system perform better with larger
datasets, there will always be a finite limit to how large the repository
can be, how many users you can service, and how quickly it will process
requests for any given hardware allocation.
Yes, DSpace can do a better job than it currently does, but it's just
postponing the inevitable. How much in technology relies on just making
things bigger/faster? Even our single system hardware is generally made of
multiple identical components - CPUs with multiple cores, memory consisting
of multiple 'sticks', each consisting of multiple storage chips, storage
combining multiple hard drives each having multiple platters.
And much of our dependencies are going the same way - Oracle database
clusters, Solr is designed to get scalability from running over multiple
shards, even Postgres has taken a major step towards clustering /
replication with it's 9.0 release.
Either way, you will always hit a hard limit with keeping things on a single
system - so at some point, something has to give, whether it's separating
out DSpace application, Solr and Postgres instances to separate machines, or
accepting this reality in the repository and building it to scale across
multiple nodes itself. This in turn would bring benefits to how easily you
can scale (in theory, a lot easier to scale at the repository level than
scaling each of it's individual components), as well as potentially better
preservation and federation capabilities.
G
------------------------------------------------------------------------------
Virtualization is moving to the mainstream and overtaking non-virtualized
environment for deploying applications. Does it make network security
easier or more difficult to achieve? Read this whitepaper to separate the
two and get a better understanding.
http://p.sf.net/sfu/hp-phase2-d2d
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech