Bill,

I just saw your request on discussing bitstream sizes in DSpace and would 
like to join the conversation.

We are also having some reservations about uploading large files via the 
Web UI to DSpace. For us the reasonable limit to upload this way is at 
about 5 GB. If someone wants to publish data larger than this (which 
happened a couple of times yet, and we expect that it will increase to 
happen in the future), we are offering them to upload the files to our 
server via WebDav. Once we have the files, we are building an SAF import 
package with them and ingest it to DSpace on behalf of the user, ingesting 
it into the normal approval workflow. There we reject the item, so that the 
user has the chance to review and change the metadata before it's 
eventually approved. Another scenario using a similar approach is to let 
the user create a new item in DSpace, but tell them not to upload the file 
within that process and instead to use the WebDav option to do that. If we 
know the item number the user created and have the files, we can add the 
file with the dspace command (however, this is an new function The Library 
Code implemented for us and thus not available in generic DSpace yet).

We do not use the upload limit in DSpace - as far as I remember it was not 
working properly on DSpace 5 JSPUI, though. But admittedly I haven't tried 
it for a long time now, so maybe this issue has been solved in the meantime.

Let me add another aspect about large files, but in the other direction 
(not the upload aspect, but the download aspect): we are storing all of our 
bitstreams on an S3 storage and as DSpace 5 does not natively support that, 
we are using Cloudian Hyperfile for that, which is providing an 
NFS-mountable volume, which is linked to our assetstore. That means, all of 
the bitstreams (including thumbnails, licenses and so on) are going to the 
S3 storage. This is basically working fine, but with large files we once 
had some trouble in context with web crawlers harvesting those files: if 
too many users were getting too many of the large files parallely, this 
caused cache problems on the hyperfile volume. To avoid this, we have tuned 
some of the cache settings on Hyperfile, and we excluded the big files in 
our robots.txt from being crawled (as we think, it would be rather useless 
to crawl them at all). That solved those problems until now. But I guess 
it's something worth noting.

If anyone has some experience with other download regulators preventing a 
user to download too many stuff parallely, I would be eager to know about 
that.

Best
Oliver

Bill Tantzen schrieb am Donnerstag, 1. April 2021 um 20:23:12 UTC+2:

> If you have a minute, I am trying to get a feel for some of the larger 
> (reasonable) bitstreams the community is currently supporting.  On my site, 
> we have removed the DSpace upload limits to allow for records containing 
> research data, but of course there are practical limits that dictate what 
> makes for a good user experience.
>
> What is the largest bitstream you support?  Do you enforce upload limits?  
> Assuming download speeds are faster than upload speeds, what are some of 
> the methods in use (besides the DSpace gui) to get large files onto the 
> server?  What are some alternatives to simple DSpace upload currently 
> utilized -- like globus for instance?
>
> I realize the answer to these questions will always include "it 
> depends...", but are these all questions you have had at your institution 
> and how have you dealt with them?
>
> Thanks for any discussion you wish to contribute!
> ~~ Bill
>
> -- 
> Human wheels spin round and round
> While the clock keeps the pace... -- John Mellencamp
> ________________________________________________________________
> Bill Tantzen    University of Minnesota Libraries
> 612-626-9949 <(612)%20626-9949> (U of M)    612-325-1777 
> <(612)%20325-1777> (cell)
>

-- 
All messages to this mailing list should adhere to the Code of Conduct: 
https://duraspace.org/about/policies/code-of-conduct/
--- 
You received this message because you are subscribed to the Google Groups 
"DSpace Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/dspace-community/b1cc5535-3171-48f8-ba7c-495b1631e5adn%40googlegroups.com.

Reply via email to