Thanks for the responses, Chris and Ben. Your thoughts are very helpful! --Jamey
On 1/28/11 9:36 AM, "Chris Wilper" <cwil...@duraspace.org> wrote: >Hi Jamey, > >On Tue, Jan 25, 2011 at 6:12 PM, Wood, Jamey <jamey.w...@nrel.gov> wrote: >> Hello, >> >> I'm trying to understand how Fedora Commons might be applied to >>managing datasets that: >> >> * May each consist of several thousand individual files > >Such datasets would be best modeled as multiple Fedora objects, with >relationships defined between them. Fedora has no hard limit on the >number of datastreams that can be stored in each object, but there are >practical limits (e.g. memory) that suggest you should stick to a >relatively small number of datastreams per object. Combined with the >modeling flexibility you get by following an "atomistic" approach, I'd >suggest keeping it down to less than a dozen per object. One popular >approach is to have a single primary stream per object, with a few >datastreams that act as metadata for the object. > >> * May have files organized in some meaningful >> hierarchical directory structure (e.g. "type1/subtype1/file1.csv") > >If you choose to have Fedora manage the datastreams (control group = >"M"), you lose control over how the paths are allocated in storage. >If tight control over the paths is needed, you can use externally >referenced datastreams instead (control group "E"). With control >group "E", the content (or just the location) can still be accessed >through the Fedora APIs, but you have to make any needed modifications >to it out of band. > >> * Would benefit from some form of "whole-object" versioning >> (along the lines used by the eSciDoc project [1]) > >I think the approach described in that paper has worked well for the >eSciDoc project. As mentioned in the paper, Fedora's built-in >versioning is only at the datastream level, but more powerful, >higher-order versioning can be done through the use special >relationships that are understood by your application. > >> [...] > >> One possibility I'm wondering about would be to just create >> some kind of top-level Fedora Commons object that has a >> pointer to the top-level data location (URL), but doesn't >> attempt to track individual files within the dataset. Then if >> a new revision of the dataset is published, that top-level >> URL pointer might be directed to some new location. >> Is this a reasonable approach? >> Or would it be considered bad practice? > >That's certainly lightweight approach. Whether it's appropriate >depends on how you plan to use Fedora. If you just want Fedora to act >as a "registry" of your datasets (so you can describe and work with >them at the dataset level only), I think it sounds like a reasonable >approach. > >On the other hand, if you want to be able to describe the individual >files within each dataset, I'd recommend having a Fedora object for >each (in which you can record fixity, format, and other important >metadata), and pointing to each using a URL (I'm assuming you'd opt >for the "E" control group). Then they could each be related to the >dataset object via the RELS-EXT datastream. This would open up more >options, but requires a bit more thought on how to model the >components of the dataset within Fedora, and also means you'll need to >come up with a strategy for updating the Fedora objects when/if the >individual files change (either in location or content). > >- Chris > >-------------------------------------------------------------------------- >---- >Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)! >Finally, a world-class log management solution at an even better >price-free! >Download using promo code Free_Logger_4_Dev2Dev. Offer expires >February 28th, so secure your free ArcSight Logger TODAY! >http://p.sf.net/sfu/arcsight-sfd2d >_______________________________________________ >Fedora-commons-users mailing list >Fedora-commons-users@lists.sourceforge.net >https://lists.sourceforge.net/lists/listinfo/fedora-commons-users ------------------------------------------------------------------------------ Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)! Finally, a world-class log management solution at an even better price-free! Download using promo code Free_Logger_4_Dev2Dev. Offer expires February 28th, so secure your free ArcSight Logger TODAY! http://p.sf.net/sfu/arcsight-sfd2d _______________________________________________ Fedora-commons-users mailing list Fedora-commons-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/fedora-commons-users