Robin,

that's interesting - it would be great to hear more about the enhancements 
you have made (for us it would be most interesting for DSpace 5) - maybe 
your colleagues can add more details about that? Do you have those 
enhancements publicly available on Github?
The "Download All" button sounds interesting as well - actually we had a 
request about something pretty similar recently. 

Best regards
Oliver

rice schrieb am Freitag, 30. April 2021 um 18:34:25 UTC+2:

> Hello,
> I saw this and want to give a brief response, about to go on leave but my 
> colleagues could say more, John Pinto or Pauline Ward. 
>
> In Edinburgh DataShare https://datashare.ed.ac.uk/ which is an 
> institutional data repository, we have made some enhancements (first in 
> DSpace 5 now in 6.x) to allow drag and drop uploads up to 20 GB per item. 
> We will also allow batch import for up to 100 GB per item. We have tested 
> this and we have a message for larger data downloads that it will take a 
> while sometimes more than a day, but it is resumable download so robust.
>
> We also have a 'download all' button which points to a zip file of the 
> item's bitstreams, since most datasets have numerous files.
>
> Cheers,
> Robin Rice
> University of Edinburgh Library
>
>
> On Friday, 9 April 2021 at 15:28:42 UTC+1 [email protected] wrote:
>
>> Bill,
>>
>> I just saw your request on discussing bitstream sizes in DSpace and would 
>> like to join the conversation.
>>
>> We are also having some reservations about uploading large files via the 
>> Web UI to DSpace. For us the reasonable limit to upload this way is at 
>> about 5 GB. If someone wants to publish data larger than this (which 
>> happened a couple of times yet, and we expect that it will increase to 
>> happen in the future), we are offering them to upload the files to our 
>> server via WebDav. Once we have the files, we are building an SAF import 
>> package with them and ingest it to DSpace on behalf of the user, ingesting 
>> it into the normal approval workflow. There we reject the item, so that the 
>> user has the chance to review and change the metadata before it's 
>> eventually approved. Another scenario using a similar approach is to let 
>> the user create a new item in DSpace, but tell them not to upload the file 
>> within that process and instead to use the WebDav option to do that. If we 
>> know the item number the user created and have the files, we can add the 
>> file with the dspace command (however, this is an new function The Library 
>> Code implemented for us and thus not available in generic DSpace yet).
>>
>> We do not use the upload limit in DSpace - as far as I remember it was 
>> not working properly on DSpace 5 JSPUI, though. But admittedly I haven't 
>> tried it for a long time now, so maybe this issue has been solved in the 
>> meantime.
>>
>> Let me add another aspect about large files, but in the other direction 
>> (not the upload aspect, but the download aspect): we are storing all of our 
>> bitstreams on an S3 storage and as DSpace 5 does not natively support that, 
>> we are using Cloudian Hyperfile for that, which is providing an 
>> NFS-mountable volume, which is linked to our assetstore. That means, all of 
>> the bitstreams (including thumbnails, licenses and so on) are going to the 
>> S3 storage. This is basically working fine, but with large files we once 
>> had some trouble in context with web crawlers harvesting those files: if 
>> too many users were getting too many of the large files parallely, this 
>> caused cache problems on the hyperfile volume. To avoid this, we have tuned 
>> some of the cache settings on Hyperfile, and we excluded the big files in 
>> our robots.txt from being crawled (as we think, it would be rather useless 
>> to crawl them at all). That solved those problems until now. But I guess 
>> it's something worth noting.
>>
>> If anyone has some experience with other download regulators preventing a 
>> user to download too many stuff parallely, I would be eager to know about 
>> that.
>>
>> Best
>> Oliver
>>
>> Bill Tantzen schrieb am Donnerstag, 1. April 2021 um 20:23:12 UTC+2:
>>
>>> If you have a minute, I am trying to get a feel for some of the larger 
>>> (reasonable) bitstreams the community is currently supporting.  On my site, 
>>> we have removed the DSpace upload limits to allow for records containing 
>>> research data, but of course there are practical limits that dictate what 
>>> makes for a good user experience.
>>>
>>> What is the largest bitstream you support?  Do you enforce upload 
>>> limits?  Assuming download speeds are faster than upload speeds, what are 
>>> some of the methods in use (besides the DSpace gui) to get large files onto 
>>> the server?  What are some alternatives to simple DSpace upload currently 
>>> utilized -- like globus for instance?
>>>
>>> I realize the answer to these questions will always include "it 
>>> depends...", but are these all questions you have had at your institution 
>>> and how have you dealt with them?
>>>
>>> Thanks for any discussion you wish to contribute!
>>> ~~ Bill
>>>
>>> -- 
>>> Human wheels spin round and round
>>> While the clock keeps the pace... -- John Mellencamp
>>> ________________________________________________________________
>>> Bill Tantzen    University of Minnesota Libraries
>>> 612-626-9949 <(612)%20626-9949> (U of M)    612-325-1777 
>>> <(612)%20325-1777> (cell)
>>>
>>

-- 
All messages to this mailing list should adhere to the Code of Conduct: 
https://duraspace.org/about/policies/code-of-conduct/
--- 
You received this message because you are subscribed to the Google Groups 
"DSpace Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/dspace-community/7431e2de-5692-4891-b449-5510fba5b332n%40googlegroups.com.

Reply via email to