Mark and all:

Even if the proposed patch doesn't fit in with the current architecture of the 
system, I think it would be useful to make a binary easily available with the 
fast import code.

Graham made some excellent points yesterday evening.  I'm paraphrasing and may 
have muddled this a bit, but:
       -  Just because a system has been made faster in one area doesn't mean 
it's now scalable
       -  A gigantic system may break or become unusable in other areas and 
need other adjustments - for example, search indexes may need to be sharded.

Making the fast import tool available, at least as an option, would give 
organizations one means of quickly loading large amounts of their data into 
test systems so that they can start to poke at prototypes of gigantic systems 
and see where they might break.

I know that there are people with data collection, testing, and research skills 
at organizations that have access to large amounts of data, and experience with 
the DSpace system, who could justify spending staff resources on identifying 
the scalability issues if they could show a gigantic system now.  This fast 
import tool would help them produce the giant test system.

Can the fast importer be made readily available somewhere as an aid to 
identifying and testing scalability issues in the current and future versions 
of DSpace?

thanks,
keith


----- Original Message -----
From: "Mark Diggory" <mdigg...@atmire.com>
To: "Simon Brown" <st...@cam.ac.uk>, dspace-devel@lists.sourceforge.net
Sent: Wednesday, January 27, 2010 6:32:48 PM GMT -05:00 US/Canada Eastern
Subject: Re: [Dspace-devel] [DSJ] Commented: (DS-470) Batch import times 
increase drastically as repository size increases; patch to mitigate the       
problem


We discuss it because we seek to maintain an appropriate separation of
concerns in our architecture. And because Graham usually challenges us
to look at aspects of that architecture that are important.  What is
under discussion is not that performance can't be improved by your
patch, you've identified a very important issue in batch processing.
We are discussing architecturally if we want to alter the
Context/EventManager framework and expose calls to pruneIndex.  We
want to be careful to avoid exposing too much of the internals of the
Browse system outside in the application architecture.

Excellent work on finding a means to improve DSpace performance.

Cheers,
Mark


-- 
Mark R. Diggory
Head of U.S. Operations - @mire

------------------------------------------------------------------------------
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
_______________________________________________
Dspace-devel mailing list
Dspace-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-devel

Reply via email to