Thank you Terry. How fast do your DSpace grow? How many items per month or year? Do you do clustering / load balancing? What kind of hardware do you need to run it? I would be grateful if you can share those information.
Vlastik On 8/23/19 6:28 PM, Terry Brady wrote: > Here are some details about DigitalGeorgetown. > > * Total items: 546,000 > * Public items: 397,000 > * Citation only items: ~470,000 > > As we tested and migrated to DSpace 6x, we did encounter a few > performance issues. We have contributed patches to DSpace 6x releases > (and to the future DSpace 6.4 release) to help resolve these issues. > > We preserve our assets in the APTrust (Academic Preservation Trust) > service, so we do not run the DSpace checksum checker on our DSpace > instance. > > Terry > > On Fri, Aug 23, 2019 at 7:48 AM Tim Donohue <[email protected] > <mailto:[email protected]>> wrote: > > Hello Vlastimil, > > Unfortunately, the size of DSpace sites is very difficult to track > overall (it relies entirely on self reporting). > > I know there are very large sites out there... a few that come to > mind are U of Cambridge (https://www.repository.cam.ac.uk > <https://www.repository.cam.ac.uk/>), and Georgetown University > (https://repository.library.georgetown.edu/). I cannot claim to > know exactly how large the sites are though, as each of these sites > may have access restricted content (which is not even visible on the > web). However, in terms of public content alone each has 250-350 > thousand items. > > I also admit that I don't know whether there are larger sites out > there. But, maybe institutions on this mailing list will > self-report if they have more than 400 thousand items. (I know I'd > love to hear which sites have >400K items!) > > I think Mark Wood gave a thorough answer regarding the number of > items possible in a DSpace. Technically, the biggest limitation is > the amount of server space & memory available (as larger sites need > more of each). For each release we attempt to make DSpace as > performant (and memory lean) as we can, and as memory issues are > reported we resolve them as bugs in a new release. For example, for > the upcoming DSpace 7 release (which is still under active > development) we are running more detailed performance testing as > detailed > here: > https://wiki.duraspace.org/display/DSPACE/DSpace+7+Performance+Testing > At this time, that performance testing is more geared towards > minimizing CPU load and memory overall (which will also help in > scaling). > > Tim > > ------------------------------------------------------------------------ > *From:* [email protected] > <mailto:[email protected]> > <[email protected] > <mailto:[email protected]>> on behalf of Vlastimil > Krejčíř <[email protected] <mailto:[email protected]>> > *Sent:* Friday, August 23, 2019 5:57 AM > *To:* DSpace Community <[email protected] > <mailto:[email protected]>> > *Subject:* [dspace-community] Scalability of DSpace > > Hi all, > > back in April 2013 I asked the community about the DSpace > scalability, see: > > > http://dspace.2283337.n4.nabble.com/DSpace-scalability-tens-of-hundreds-TBs-tt4662988.html#a4663047 > > Now, at 2019, it is time to ask the same question :-). > > How much data / how many items can DSpace handle? The DSpace system > at Cambridge University (https://www.repository.cam.ac.uk/) was > reported as the largest then. I can see it stores about 245 > thousands of items nowadays. > > Does anyone else have bigger one? Are there new information on > scalability since 2013? > > Regards, > > Vlastik Krejčíř > > -- > > ---------------------------------------------------------------------------- > Vlastimil Krejčíř > Library and Information Centre, Institute of Computer Science > Masaryk University, Brno, Czech Republic > Email: krejcir (at) ics (dot) muni (dot) cz > Phone: +420 549 49 3872 > OpenPGP key: https://kic-internal.ics.muni.cz/~krejvl/pgp/ > Fingerprint: 7800 64B2 6E20 645B 56AF C303 34CB 1495 C641 11B9 > > ---------------------------------------------------------------------------- > > -- > All messages to this mailing list should adhere to the DuraSpace > Code of Conduct: https://duraspace.org/about/policies/code-of-conduct/ > --- > You received this message because you are subscribed to the Google > Groups "DSpace Community" group. > To unsubscribe from this group and stop receiving emails from it, > send an email to [email protected] > <mailto:[email protected]>. > To view this discussion on the web visit > > https://groups.google.com/d/msgid/dspace-community/a37b7af1-59eb-4a7e-b302-196cadbed7a0%40googlegroups.com > > <https://groups.google.com/d/msgid/dspace-community/a37b7af1-59eb-4a7e-b302-196cadbed7a0%40googlegroups.com?utm_medium=email&utm_source=footer>. > > -- > All messages to this mailing list should adhere to the DuraSpace > Code of Conduct: https://duraspace.org/about/policies/code-of-conduct/ > --- > You received this message because you are subscribed to the Google > Groups "DSpace Community" group. > To unsubscribe from this group and stop receiving emails from it, > send an email to [email protected] > <mailto:[email protected]>. > To view this discussion on the web visit > > https://groups.google.com/d/msgid/dspace-community/DM5PR22MB05727332D082F1B9BEB443BCEDA40%40DM5PR22MB0572.namprd22.prod.outlook.com > > <https://groups.google.com/d/msgid/dspace-community/DM5PR22MB05727332D082F1B9BEB443BCEDA40%40DM5PR22MB0572.namprd22.prod.outlook.com?utm_medium=email&utm_source=footer>. > > > > -- > Terry Brady > Applications Programmer Analyst > Georgetown University Library Information Technology > https://github.com/terrywbrady/info > 425-298-5498 (Seattle, WA) > > -- > All messages to this mailing list should adhere to the DuraSpace Code of > Conduct: https://duraspace.org/about/policies/code-of-conduct/ > --- > You received this message because you are subscribed to the Google > Groups "DSpace Community" group. > To unsubscribe from this group and stop receiving emails from it, send > an email to [email protected] > <mailto:[email protected]>. > To view this discussion on the web visit > https://groups.google.com/d/msgid/dspace-community/CAMp2YEwjrRz7B%2B%2BXtyC0gV-gW90aukC5o3s2o%2B9pf4y5wE_uZA%40mail.gmail.com > <https://groups.google.com/d/msgid/dspace-community/CAMp2YEwjrRz7B%2B%2BXtyC0gV-gW90aukC5o3s2o%2B9pf4y5wE_uZA%40mail.gmail.com?utm_medium=email&utm_source=footer>. -- All messages to this mailing list should adhere to the DuraSpace Code of Conduct: https://duraspace.org/about/policies/code-of-conduct/ --- You received this message because you are subscribed to the Google Groups "DSpace Community" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/dspace-community/15980bcc-7f2e-9b95-e6a3-6b9777b43332%40ics.muni.cz.
