Hi Jamey, On Tue, Jan 25, 2011 at 6:12 PM, Wood, Jamey <jamey.w...@nrel.gov> wrote: > Hello, > > I'm trying to understand how Fedora Commons might be applied to managing > datasets that: > > * May each consist of several thousand individual files
Such datasets would be best modeled as multiple Fedora objects, with relationships defined between them. Fedora has no hard limit on the number of datastreams that can be stored in each object, but there are practical limits (e.g. memory) that suggest you should stick to a relatively small number of datastreams per object. Combined with the modeling flexibility you get by following an "atomistic" approach, I'd suggest keeping it down to less than a dozen per object. One popular approach is to have a single primary stream per object, with a few datastreams that act as metadata for the object. > * May have files organized in some meaningful > hierarchical directory structure (e.g. "type1/subtype1/file1.csv") If you choose to have Fedora manage the datastreams (control group = "M"), you lose control over how the paths are allocated in storage. If tight control over the paths is needed, you can use externally referenced datastreams instead (control group "E"). With control group "E", the content (or just the location) can still be accessed through the Fedora APIs, but you have to make any needed modifications to it out of band. > * Would benefit from some form of "whole-object" versioning > (along the lines used by the eSciDoc project [1]) I think the approach described in that paper has worked well for the eSciDoc project. As mentioned in the paper, Fedora's built-in versioning is only at the datastream level, but more powerful, higher-order versioning can be done through the use special relationships that are understood by your application. > [...] > One possibility I'm wondering about would be to just create > some kind of top-level Fedora Commons object that has a > pointer to the top-level data location (URL), but doesn't > attempt to track individual files within the dataset. Then if > a new revision of the dataset is published, that top-level > URL pointer might be directed to some new location. > Is this a reasonable approach? > Or would it be considered bad practice? That's certainly lightweight approach. Whether it's appropriate depends on how you plan to use Fedora. If you just want Fedora to act as a "registry" of your datasets (so you can describe and work with them at the dataset level only), I think it sounds like a reasonable approach. On the other hand, if you want to be able to describe the individual files within each dataset, I'd recommend having a Fedora object for each (in which you can record fixity, format, and other important metadata), and pointing to each using a URL (I'm assuming you'd opt for the "E" control group). Then they could each be related to the dataset object via the RELS-EXT datastream. This would open up more options, but requires a bit more thought on how to model the components of the dataset within Fedora, and also means you'll need to come up with a strategy for updating the Fedora objects when/if the individual files change (either in location or content). - Chris ------------------------------------------------------------------------------ Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)! Finally, a world-class log management solution at an even better price-free! Download using promo code Free_Logger_4_Dev2Dev. Offer expires February 28th, so secure your free ArcSight Logger TODAY! http://p.sf.net/sfu/arcsight-sfd2d _______________________________________________ Fedora-commons-users mailing list Fedora-commons-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/fedora-commons-users