Here are some details about DigitalGeorgetown. - Total items: 546,000 - Public items: 397,000 - Citation only items: ~470,000
As we tested and migrated to DSpace 6x, we did encounter a few performance issues. We have contributed patches to DSpace 6x releases (and to the future DSpace 6.4 release) to help resolve these issues. We preserve our assets in the APTrust (Academic Preservation Trust) service, so we do not run the DSpace checksum checker on our DSpace instance. Terry On Fri, Aug 23, 2019 at 7:48 AM Tim Donohue <[email protected]> wrote: > Hello Vlastimil, > > Unfortunately, the size of DSpace sites is very difficult to track overall > (it relies entirely on self reporting). > > I know there are very large sites out there... a few that come to mind are > U of Cambridge (https://www.repository.cam.ac.uk), and Georgetown > University (https://repository.library.georgetown.edu/). I cannot claim > to know exactly how large the sites are though, as each of these sites may > have access restricted content (which is not even visible on the web). > However, in terms of public content alone each has 250-350 thousand items. > > I also admit that I don't know whether there are larger sites out there. > But, maybe institutions on this mailing list will self-report if they have > more than 400 thousand items. (I know I'd love to hear which sites have > >400K items!) > > I think Mark Wood gave a thorough answer regarding the number of items > possible in a DSpace. Technically, the biggest limitation is the amount of > server space & memory available (as larger sites need more of each). For > each release we attempt to make DSpace as performant (and memory lean) as > we can, and as memory issues are reported we resolve them as bugs in a new > release. For example, for the upcoming DSpace 7 release (which is still > under active development) we are running more detailed performance testing > as detailed here: > https://wiki.duraspace.org/display/DSPACE/DSpace+7+Performance+Testing > At this time, that performance testing is more geared towards minimizing > CPU load and memory overall (which will also help in scaling). > > Tim > > ------------------------------ > *From:* [email protected] < > [email protected]> on behalf of Vlastimil Krejčíř < > [email protected]> > *Sent:* Friday, August 23, 2019 5:57 AM > *To:* DSpace Community <[email protected]> > *Subject:* [dspace-community] Scalability of DSpace > > Hi all, > > back in April 2013 I asked the community about the DSpace scalability, see: > > > http://dspace.2283337.n4.nabble.com/DSpace-scalability-tens-of-hundreds-TBs-tt4662988.html#a4663047 > > Now, at 2019, it is time to ask the same question :-). > > How much data / how many items can DSpace handle? The DSpace system at > Cambridge University (https://www.repository.cam.ac.uk/) was reported as > the largest then. I can see it stores about 245 thousands of items nowadays. > > Does anyone else have bigger one? Are there new information on scalability > since 2013? > > Regards, > > Vlastik Krejčíř > > -- > > ---------------------------------------------------------------------------- > Vlastimil Krejčíř > Library and Information Centre, Institute of Computer Science > Masaryk University, Brno, Czech Republic > Email: krejcir (at) ics (dot) muni (dot) cz > Phone: +420 549 49 3872 > OpenPGP key: https://kic-internal.ics.muni.cz/~krejvl/pgp/ > Fingerprint: 7800 64B2 6E20 645B 56AF C303 34CB 1495 C641 11B9 > > ---------------------------------------------------------------------------- > > -- > All messages to this mailing list should adhere to the DuraSpace Code of > Conduct: https://duraspace.org/about/policies/code-of-conduct/ > --- > You received this message because you are subscribed to the Google Groups > "DSpace Community" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/dspace-community/a37b7af1-59eb-4a7e-b302-196cadbed7a0%40googlegroups.com > <https://groups.google.com/d/msgid/dspace-community/a37b7af1-59eb-4a7e-b302-196cadbed7a0%40googlegroups.com?utm_medium=email&utm_source=footer> > . > > -- > All messages to this mailing list should adhere to the DuraSpace Code of > Conduct: https://duraspace.org/about/policies/code-of-conduct/ > --- > You received this message because you are subscribed to the Google Groups > "DSpace Community" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/dspace-community/DM5PR22MB05727332D082F1B9BEB443BCEDA40%40DM5PR22MB0572.namprd22.prod.outlook.com > <https://groups.google.com/d/msgid/dspace-community/DM5PR22MB05727332D082F1B9BEB443BCEDA40%40DM5PR22MB0572.namprd22.prod.outlook.com?utm_medium=email&utm_source=footer> > . > -- Terry Brady Applications Programmer Analyst Georgetown University Library Information Technology https://github.com/terrywbrady/info 425-298-5498 (Seattle, WA) -- All messages to this mailing list should adhere to the DuraSpace Code of Conduct: https://duraspace.org/about/policies/code-of-conduct/ --- You received this message because you are subscribed to the Google Groups "DSpace Community" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/dspace-community/CAMp2YEwjrRz7B%2B%2BXtyC0gV-gW90aukC5o3s2o%2B9pf4y5wE_uZA%40mail.gmail.com.
