Bill, I just saw your request on discussing bitstream sizes in DSpace and would like to join the conversation.
We are also having some reservations about uploading large files via the Web UI to DSpace. For us the reasonable limit to upload this way is at about 5 GB. If someone wants to publish data larger than this (which happened a couple of times yet, and we expect that it will increase to happen in the future), we are offering them to upload the files to our server via WebDav. Once we have the files, we are building an SAF import package with them and ingest it to DSpace on behalf of the user, ingesting it into the normal approval workflow. There we reject the item, so that the user has the chance to review and change the metadata before it's eventually approved. Another scenario using a similar approach is to let the user create a new item in DSpace, but tell them not to upload the file within that process and instead to use the WebDav option to do that. If we know the item number the user created and have the files, we can add the file with the dspace command (however, this is an new function The Library Code implemented for us and thus not available in generic DSpace yet). We do not use the upload limit in DSpace - as far as I remember it was not working properly on DSpace 5 JSPUI, though. But admittedly I haven't tried it for a long time now, so maybe this issue has been solved in the meantime. Let me add another aspect about large files, but in the other direction (not the upload aspect, but the download aspect): we are storing all of our bitstreams on an S3 storage and as DSpace 5 does not natively support that, we are using Cloudian Hyperfile for that, which is providing an NFS-mountable volume, which is linked to our assetstore. That means, all of the bitstreams (including thumbnails, licenses and so on) are going to the S3 storage. This is basically working fine, but with large files we once had some trouble in context with web crawlers harvesting those files: if too many users were getting too many of the large files parallely, this caused cache problems on the hyperfile volume. To avoid this, we have tuned some of the cache settings on Hyperfile, and we excluded the big files in our robots.txt from being crawled (as we think, it would be rather useless to crawl them at all). That solved those problems until now. But I guess it's something worth noting. If anyone has some experience with other download regulators preventing a user to download too many stuff parallely, I would be eager to know about that. Best Oliver Bill Tantzen schrieb am Donnerstag, 1. April 2021 um 20:23:12 UTC+2: > If you have a minute, I am trying to get a feel for some of the larger > (reasonable) bitstreams the community is currently supporting. On my site, > we have removed the DSpace upload limits to allow for records containing > research data, but of course there are practical limits that dictate what > makes for a good user experience. > > What is the largest bitstream you support? Do you enforce upload limits? > Assuming download speeds are faster than upload speeds, what are some of > the methods in use (besides the DSpace gui) to get large files onto the > server? What are some alternatives to simple DSpace upload currently > utilized -- like globus for instance? > > I realize the answer to these questions will always include "it > depends...", but are these all questions you have had at your institution > and how have you dealt with them? > > Thanks for any discussion you wish to contribute! > ~~ Bill > > -- > Human wheels spin round and round > While the clock keeps the pace... -- John Mellencamp > ________________________________________________________________ > Bill Tantzen University of Minnesota Libraries > 612-626-9949 <(612)%20626-9949> (U of M) 612-325-1777 > <(612)%20325-1777> (cell) > -- All messages to this mailing list should adhere to the Code of Conduct: https://duraspace.org/about/policies/code-of-conduct/ --- You received this message because you are subscribed to the Google Groups "DSpace Community" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/dspace-community/b1cc5535-3171-48f8-ba7c-495b1631e5adn%40googlegroups.com.
