Thanks for the responses, Chris and Ben.  Your thoughts are very helpful!

--Jamey

On 1/28/11 9:36 AM, "Chris Wilper" <cwil...@duraspace.org> wrote:

>Hi Jamey,
>
>On Tue, Jan 25, 2011 at 6:12 PM, Wood, Jamey <jamey.w...@nrel.gov> wrote:
>> Hello,
>>
>> I'm trying to understand how Fedora Commons might be applied to
>>managing datasets that:
>>
>>  * May each consist of several thousand individual files
>
>Such datasets would be best modeled as multiple Fedora objects, with
>relationships defined between them.  Fedora has no hard limit on the
>number of datastreams that can be stored in each object, but there are
>practical limits (e.g. memory) that suggest you should stick to a
>relatively small number of datastreams per object.  Combined with the
>modeling flexibility you get by following an "atomistic" approach, I'd
>suggest keeping it down to less than a dozen per object.  One popular
>approach is to have a single primary stream per object, with a few
>datastreams that act as metadata for the object.
>
>>  * May have files organized in some meaningful
>> hierarchical directory structure (e.g. "type1/subtype1/file1.csv")
>
>If you choose to have Fedora manage the datastreams (control group =
>"M"), you lose control over how the paths are allocated in storage.
>If tight control over the paths is needed, you can use externally
>referenced datastreams instead (control group "E").  With control
>group "E", the content (or just the location) can still be accessed
>through the Fedora APIs, but you have to make any needed modifications
>to it out of band.
>
>>  * Would benefit from some form of "whole-object" versioning
>> (along the lines used by the eSciDoc project [1])
>
>I think the approach described in that paper has worked well for the
>eSciDoc project.  As mentioned in the paper, Fedora's built-in
>versioning is only at the datastream level, but more powerful,
>higher-order versioning can be done through the use special
>relationships that are understood by your application.
>
>> [...]
>
>> One possibility I'm wondering about would be to just create
>> some kind of top-level Fedora Commons object that has a
>> pointer to the top-level data location (URL), but doesn't
>> attempt to track individual files within the dataset.  Then if
>> a new revision of the dataset is published, that top-level
>> URL pointer might be directed to some new location.
>> Is this a reasonable approach?
>> Or would it be considered bad practice?
>
>That's certainly lightweight approach.  Whether it's appropriate
>depends on how you plan to use Fedora.  If you just want Fedora to act
>as a "registry" of your datasets (so you can describe and work with
>them at the dataset level only), I think it sounds like a reasonable
>approach.
>
>On the other hand, if you want to be able to describe the individual
>files within each dataset, I'd recommend having a Fedora object for
>each (in which you can record fixity, format, and other important
>metadata), and pointing to each using a URL (I'm assuming you'd opt
>for the "E" control group).  Then they could each be related to the
>dataset object via the RELS-EXT datastream.  This would open up more
>options, but requires a bit more thought on how to model the
>components of the dataset within Fedora, and also means you'll need to
>come up with a strategy for updating the Fedora objects when/if the
>individual files change (either in location or content).
>
>- Chris
>
>--------------------------------------------------------------------------
>----
>Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
>Finally, a world-class log management solution at an even better
>price-free!
>Download using promo code Free_Logger_4_Dev2Dev. Offer expires
>February 28th, so secure your free ArcSight Logger TODAY!
>http://p.sf.net/sfu/arcsight-sfd2d
>_______________________________________________
>Fedora-commons-users mailing list
>Fedora-commons-users@lists.sourceforge.net
>https://lists.sourceforge.net/lists/listinfo/fedora-commons-users


------------------------------------------------------------------------------
Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires 
February 28th, so secure your free ArcSight Logger TODAY! 
http://p.sf.net/sfu/arcsight-sfd2d
_______________________________________________
Fedora-commons-users mailing list
Fedora-commons-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/fedora-commons-users

Reply via email to