Sorry to pester, but does anyone have thoughts on this? Thanks, Jamey
From: Jamey Wood <jamey.w...@nrel.gov<mailto:jamey.w...@nrel.gov>> Date: Tue, 25 Jan 2011 16:12:40 -0700 To: "fedora-commons-users@lists.sourceforge.net<mailto:fedora-commons-users@lists.sourceforge.net>" <fedora-commons-users@lists.sourceforge.net<mailto:fedora-commons-users@lists.sourceforge.net>> Subject: Fedora Commons for Large Datasets with Thousands of Files Hello, I'm trying to understand how Fedora Commons might be applied to managing datasets that: * May each consist of several thousand individual files * May have files organized in some meaningful hierarchical directory structure (e.g. "type1/subtype1/file1.csv") * Would benefit from some form of "whole-object" versioning (along the lines used by the eSciDoc project [1]) An example of one such dataset can be seen at [2] (with overview and documentation materials at [3]). At first, I was assuming that each such dataset would be a single Fedora Commons object that would have a separate datastream for each file belonging to the dataset. And then whole-object versioning could be implemented using a special datastream, as described in the eSciDoc paper. But after looking through this mailing list's archives, I found the "Max number of datastreams of a object" thread (from December) where multiple people noted that having Fedora Commons objects with hundreds or thousands of datastreams probably isn't a good idea (although there isn't necessarily a hard limit preventing it). So now I'm wondering how to best model these datasets in Fedora Commons? Or is Fedora Commons simply not the right tool for this usage scenario? One possibility I'm wondering about would be to just create some kind of top-level Fedora Commons object that has a pointer to the top-level data location (URL), but doesn't attempt to track individual files within the dataset. Then if a new revision of the dataset is published, that top-level URL pointer might be directed to some new location. Is this a reasonable approach? Or would it be considered bad practice? Any thoughts on this or pointers towards general best practices would be appreciated. Thanks, Jamey 1: https://www.escidoc.org/media/docs/ges-versioning-article.pdf 2: ftp://ftp.ncdc.noaa.gov/pub/data/nsrdb-solar/SUNY-gridded-data/ 3: http://rredc.nrel.gov/solar/old_data/nsrdb/1991-2005/ ------------------------------------------------------------------------------ Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)! Finally, a world-class log management solution at an even better price-free! Download using promo code Free_Logger_4_Dev2Dev. Offer expires February 28th, so secure your free ArcSight Logger TODAY! http://p.sf.net/sfu/arcsight-sfd2d _______________________________________________ Fedora-commons-users mailing list Fedora-commons-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/fedora-commons-users