Sorry to pester, but does anyone have thoughts on this?

Thanks,
Jamey

From: Jamey Wood <jamey.w...@nrel.gov<mailto:jamey.w...@nrel.gov>>
Date: Tue, 25 Jan 2011 16:12:40 -0700
To: 
"fedora-commons-users@lists.sourceforge.net<mailto:fedora-commons-users@lists.sourceforge.net>"
 
<fedora-commons-users@lists.sourceforge.net<mailto:fedora-commons-users@lists.sourceforge.net>>
Subject: Fedora Commons for Large Datasets with Thousands of Files

Hello,

I'm trying to understand how Fedora Commons might be applied to managing 
datasets that:

  * May each consist of several thousand individual files
  * May have files organized in some meaningful hierarchical directory 
structure (e.g. "type1/subtype1/file1.csv")
  * Would benefit from some form of "whole-object" versioning (along the lines 
used by the eSciDoc project [1])

An example of one such dataset can be seen at [2] (with overview and 
documentation materials at [3]).

At first, I was assuming that each such dataset would be a single Fedora 
Commons object that would have a separate datastream for each file belonging to 
the dataset.  And then whole-object versioning could be implemented using a 
special datastream, as described in the eSciDoc paper.

But after looking through this mailing list's archives, I found the "Max number 
of datastreams of a object" thread (from December) where multiple people noted 
that having Fedora Commons objects with hundreds or thousands of datastreams 
probably isn't a good idea (although there isn't necessarily a hard limit 
preventing it).  So now I'm wondering how to best model these datasets in 
Fedora Commons?  Or is Fedora Commons simply not the right tool for this usage 
scenario?

One possibility I'm wondering about would be to just create some kind of 
top-level Fedora Commons object that has a pointer to the top-level data 
location (URL), but doesn't attempt to track individual files within the 
dataset.  Then if a new revision of the dataset is published, that top-level 
URL pointer might be directed to some new location.  Is this a reasonable 
approach?  Or would it be considered bad practice?

Any thoughts on this or pointers towards general best practices would be 
appreciated.

Thanks,
Jamey

1: https://www.escidoc.org/media/docs/ges-versioning-article.pdf
2: ftp://ftp.ncdc.noaa.gov/pub/data/nsrdb-solar/SUNY-gridded-data/
3: http://rredc.nrel.gov/solar/old_data/nsrdb/1991-2005/

------------------------------------------------------------------------------
Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires 
February 28th, so secure your free ArcSight Logger TODAY! 
http://p.sf.net/sfu/arcsight-sfd2d
_______________________________________________
Fedora-commons-users mailing list
Fedora-commons-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/fedora-commons-users

Reply via email to