You might like to get in touch with the AIMS project:

http://www2.lib.virginia.edu/aims/

which is tackling exactly the kinds of issues to which you refer. Amongst their 
intentions is to develop a workflow based on Hydra components that can handle 
the kind of arrangement and description problem that you so rightly describe as 
difficult to solve in Fedora by itself.

---
A. Soroka
Digital Research and Scholarship R & D and Online Library Environment
the University of Virginia Library




On Dec 21, 2010, at 10:52 AM, Peter Cliff wrote:

> I do not know about M. Jallud's domain, but this brings out something the 
> futureArch project at the University of Oxford have wrestled with. Here we 
> needed to ingest disk images. Each image is itself a nice, self-contained 
> file system. Some are small floppy disks, others are larger hard drive images 
> and the like. The smallest of these (so far) is 2GB and contains several 
> thousand files (I don't have the exact figure to hand, but suffice to say 
> more than the number of data streams it'd be sensible to attach to a Fedora 
> object - though many of those files are probably OS junk, etc.).
> 
> In an ideal world, each disk image would be appraised and the individual 
> useful files extracted and put into a repository individually. In reality, 
> sifting through a disk image of that number of files is about as onerous as 
> sifting through a large number of boxes and so it can take time and staff and 
> thus the disk image needs preserving until we get those resources to address 
> it. Further, a bit-by-bit copy of the disk may contain useful research data 
> in itself...
> 
> In deciding if we use Fedora as repository for these disk images, the 
> question was how to model the image and its files in Fedora and we thought of 
> two ways:
> 
> 1) Ingest the disk image and add a datastream per file, as per this thread. 
> As you can imagine, that isn't a great way to use Fedora...
> 
> 2) Break the image up into files and ingest each and create a contents list 
> with associated file system metadata, etc. with each file. This seems doable, 
> but it seems a large overhead just to use Fedora.
> 
> Which led to the conclusion that Fedora probably wasn't the tool *for this 
> particular job* (don't flame me - I'm well aware of the many good uses for 
> Fedora!) but this has been bugging me ever since and perhaps we're just the 
> victims of a "desire to map preexisting persistence architectures"... :-)
> 
> Pete Cliff
> Bodleian Library
> 
> On 21 Dec 2010, at 15:15, <aj...@virginia.edu> wrote:
> 
>> That is the point at which I was getting-- I wonder if M. Jallud's domain is 
>> being effectively and efficiently represented in Fedora.
>> 
>> Something I see a great deal in early use of Fedora is the desire to map 
>> preexisting persistence architectures directly onto the repository. E.g. the 
>> expectation that a "directory of files" will become an "object of 
>> datastreams".
>> 
>> I don't know what M. Jallud is thinking and I don't mean to imply any 
>> criticism, but I do wonder about any Fedora-based architecture featuring 
>> objects with thousands of datastreams. It can be objectively said that such 
>> an architecture is not at all idiomatic.
>> 
>> ---
>> A. Soroka
>> Digital Research and Scholarship R & D and Online Library Environment
>> the University of Virginia Library
>> 
>> 
>> 
>> 
>> On Dec 21, 2010, at 10:06 AM, Alex Rodriguez Lopez wrote:
>> 
>>> Hi.
>>> 
>>> Maybe I'm missing something here, but wouldn't be a better approach to 
>>> create new objects (each with 1 (or some, but not 100s) datastream) for 
>>> each file and have them relate to the primary object 
>>> https://wiki.duraspace.org/display/FCR30/Digital+Object+Relationships ?
>>> 
>>> Instead of having 1 object with 1000s datastreams, you have 1 object 
>>> linked to 1000s objects (each with one datastream).
>>> 
>>> Unless you *REALLY* need all to reside in one big XML...
>>> 
>>> Pierre-Yves JALLUD, 21-12-2010 14:52:
>>>> Thanks for your answers. That conforts me in the idea that the objects I
>>>> wanted to store in FedoraCommons are not adapted for this kind of
>>>> system. I'll impose to the users to split there archives in an
>>>> acceptable number of files. They used to have a maximum of 1000 or 2000
>>>> datastreams (exceptionaly) and FC has correct answers' times. That will
>>>> be the limit of my system.
>>>> Thank you again and greetings
>>>> 
>>>>> I am wondering a little about the data model in play here. I may have
>>>>> missed an earlier part of this conversation, but I wonder if you could
>>>>> describe your domain problem a little, M. Jallud?
>>>>> Perhaps we can find a more efficient and idiomatic way to use Fedora's
>>>>> CMA than is now obvious to you... to have more than a few dozen
>>>>> datastreams in a content model is very unusual and
>>>>> implies the possibility of useful refactoring.
>>>>> 
>>>>> ---
>>>>> A. Soroka
>>>>> Digital Research and Scholarship R& D and Online Library Environment
>>>>> the University of Virginia Library
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> On Dec 20, 2010, at 9:00 AM, Asger Askov Blekinge wrote:
>>>>> 
>>>>>> Sounds about right, but this is not a hard limit.
>>>>>> 
>>>>>> As you know, Fedora stores the datastreams in one big xml file.
>>>>>> 
>>>>>> What is the maximum size of xml files? How many elements can there
>>>>> be in
>>>>>> an xml list? How long do you want to wait for fedora to parse this
>>>>>> object? Those are the relevant questions, and by answering them, you
>>>>>> will have answered your original question.
>>>>>> 
>>>>>> Regards
>>>>>> 
>>>>>> 
>>>>>> On Mon, 2010-12-20 at 14:54 +0100, Pierre-Yves JALLUD wrote:
>>>>>>> Hi everyone,
>>>>>>> I'm using 3.2.1 version of FedoraCommons. I wonder what is the maximum
>>>>>>> number of datastreams that we can add in a single object. My
>>>>> experiments
>>>>>>> seem to demonstrate that this number is around 32000 (32768?...). Is
>>>>>>> that true? Is that always true in the last versions?
>>>>>>> 
>>>>>>> Thanks for your answers.
>>>>>>> Pierre-Yves
>>>> 
>>>> 
>>>> 
>>>> ------------------------------------------------------------------------------
>>>> Lotusphere 2011
>>>> Register now for Lotusphere 2011 and learn how
>>>> to connect the dots, take your collaborative environment
>>>> to the next level, and enter the era of Social Business.
>>>> http://p.sf.net/sfu/lotusphere-d2d
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> Fedora-commons-users mailing list
>>>> Fedora-commons-users@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/fedora-commons-users
>>> 
>>> ------------------------------------------------------------------------------
>>> Lotusphere 2011
>>> Register now for Lotusphere 2011 and learn how
>>> to connect the dots, take your collaborative environment
>>> to the next level, and enter the era of Social Business.
>>> http://p.sf.net/sfu/lotusphere-d2d
>>> _______________________________________________
>>> Fedora-commons-users mailing list
>>> Fedora-commons-users@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/fedora-commons-users
>> 
>> 
>> ------------------------------------------------------------------------------
>> Lotusphere 2011
>> Register now for Lotusphere 2011 and learn how
>> to connect the dots, take your collaborative environment
>> to the next level, and enter the era of Social Business.
>> http://p.sf.net/sfu/lotusphere-d2d
>> _______________________________________________
>> Fedora-commons-users mailing list
>> Fedora-commons-users@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/fedora-commons-users
> 
> 
> ------------------------------------------------------------------------------
> Forrester recently released a report on the Return on Investment (ROI) of
> Google Apps. They found a 300% ROI, 38%-56% cost savings, and break-even
> within 7 months.  Over 3 million businesses have gone Google with Google Apps:
> an online email calendar, and document program that's accessible from your 
> browser. Read the Forrester report: http://p.sf.net/sfu/googleapps-sfnew
> _______________________________________________
> Fedora-commons-users mailing list
> Fedora-commons-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/fedora-commons-users


------------------------------------------------------------------------------
Forrester recently released a report on the Return on Investment (ROI) of
Google Apps. They found a 300% ROI, 38%-56% cost savings, and break-even
within 7 months.  Over 3 million businesses have gone Google with Google Apps:
an online email calendar, and document program that's accessible from your 
browser. Read the Forrester report: http://p.sf.net/sfu/googleapps-sfnew
_______________________________________________
Fedora-commons-users mailing list
Fedora-commons-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/fedora-commons-users

Reply via email to