> This week's question has to do with bitstreams. DSpace is designed > around discrete papers contained within single bitstreams, and it also > handles websites reasonably well. The question is: what else do you > have, what have you done with/to DSpace to accommodate it, and what > else do you need from DSpace?
Audio and Video As others have said, audio and video are challenges because of file size, preservation standards (or lack thereof), and typical user needs. We have deposited enough video, and enough interest in preserving and presenting it, to see a few things repeat: (1) Deposit almost always has to be mediated, since users either don't have the tools to convert what they have into something they can deposit, or don't want to wait while a very large file transfers via the web form. (2) If we didn't have a partnership with a streaming service on campus, people wouldn't deposit audio and video in Deep Blue. Full stop. Preservation is a secondary concern for most -- they want to be sure others have access, and they don't have the server space to provide it. And access via "download the whole thing to the desktop" doesn't work for two reasons: it's too slow and depending on the encoding the downloaded file might not work anyway. We're measured against the convenience and usability of YouTube...and yes, that's not fair, and no, pointing out that we're more reliable than YouTube doesn't matter. (See above re. preservation as a secondary concern.) (3) Depositors often don't have the option -- or don't know how -- to choose how they capture video, so you get what they produce and live with it. We've created best practices for creating high quality text, image, and audio files, but we're stuck on defining what a best practice would be for digital video. If others have settled on best practices I'd love to hear how they've decided to define preservation quality video. Not that users will deliver preservation quality even if you tell them what it is, but it would be nice to know what we mean by it. Websites and XML-based complex objects In the realm of complex objects, those wrapped in HTML and XML are easy enough to preserve and present, though I disagree about handling them reasonably well. Once they're in DSpace all's well, but getting them in is tedious (or more accurately, incredibly tedious). Handling (a) nested directory structures without having to "flatten out" a website completely by rewriting internal links and renaming files (b) being able to upload directories, nested or not, would be fantastic features to have. Complex Objects We get relatively few requests for things like lecture objects and things that require complex interactions between files, but when we do I'm usually able to help their producers understand how platform/operating system/software specific objects are inherently difficult to preserve in the long term. (A few examples of ubiquitous but now dead programs and companies usually suffice. Heck, just reminding people of how you can lose functionality between one version of PowerPoint to the next will usually suffice.) So I'm not too worried that DSpace doesn't handle these things seamlessly, since most of my depositors don't expect it to, yet. Bitstreams in general I think DSpace's biggest weakness is its trusting nature: If the depositor says it's a PDF, DSpace believes it. We've just begun to look into how we might go one small step further by using JHOVE to at least identify the format. If the depositor says it's a PDF, does that appear to be true? Never mind validating and characterizing, at least for now. Sticking with PDF, the next step would be to differentiate between PDF and PDF/A...though as mentioned above, for those of us that embrace an unmediated, self-deposit mode, we can lead our users to best practices but we can't make them use them. So we'd have to think hard about whether we'd want to reject a sub-optimal (but still understandable and usable) file, or even alert depositors to it. ____________________________________ Jim Ottaviani +1 734-763-4835 Coordinator, Deep Blue http://deepblue.lib.umich.edu University of Michigan Library Quis custodiet ipsos custodes --Juvenal, Satires VI, 347 _______________________________________________ Dspace-general mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/dspace-general
