Re: [fcrepo-user] Fedora Commons for Large Datasets with Thousands of Files

Benjamin Armintor Fri, 28 Jan 2011 08:40:06 -0800

Jamey-
  I'm not sure I understand what you want to do, but:  You could have
many objects in your Fedora repository (some representing files, and
others representing hierarchical groupings of files) that are related
by triples in their respective RELS-EXT datastreams.  Your last
paragraph, though, makes it sound like you just want a document
versioning system.  Is that the case?


- Ben

On 1/28/11, Wood, Jamey <jamey.w...@nrel.gov> wrote:
> Sorry to pester, but does anyone have thoughts on this?
>
> Thanks,
> Jamey
>
> From: Jamey Wood <jamey.w...@nrel.gov<mailto:jamey.w...@nrel.gov>>
> Date: Tue, 25 Jan 2011 16:12:40 -0700
> To:
> "fedora-commons-users@lists.sourceforge.net<mailto:fedora-commons-users@lists.sourceforge.net>"
> <fedora-commons-users@lists.sourceforge.net<mailto:fedora-commons-users@lists.sourceforge.net>>
> Subject: Fedora Commons for Large Datasets with Thousands of Files
>
> Hello,
>
> I'm trying to understand how Fedora Commons might be applied to managing
> datasets that:
>
>   * May each consist of several thousand individual files
>   * May have files organized in some meaningful hierarchical directory
> structure (e.g. "type1/subtype1/file1.csv")
>   * Would benefit from some form of "whole-object" versioning (along the
> lines used by the eSciDoc project [1])
>
> An example of one such dataset can be seen at [2] (with overview and
> documentation materials at [3]).
>
> At first, I was assuming that each such dataset would be a single Fedora
> Commons object that would have a separate datastream for each file belonging
> to the dataset.  And then whole-object versioning could be implemented using
> a special datastream, as described in the eSciDoc paper.
>
> But after looking through this mailing list's archives, I found the "Max
> number of datastreams of a object" thread (from December) where multiple
> people noted that having Fedora Commons objects with hundreds or thousands
> of datastreams probably isn't a good idea (although there isn't necessarily
> a hard limit preventing it).  So now I'm wondering how to best model these
> datasets in Fedora Commons?  Or is Fedora Commons simply not the right tool
> for this usage scenario?
>
> One possibility I'm wondering about would be to just create some kind of
> top-level Fedora Commons object that has a pointer to the top-level data
> location (URL), but doesn't attempt to track individual files within the
> dataset.  Then if a new revision of the dataset is published, that top-level
> URL pointer might be directed to some new location.  Is this a reasonable
> approach?  Or would it be considered bad practice?
>
> Any thoughts on this or pointers towards general best practices would be
> appreciated.
>
> Thanks,
> Jamey
>
> 1: https://www.escidoc.org/media/docs/ges-versioning-article.pdf
> 2: ftp://ftp.ncdc.noaa.gov/pub/data/nsrdb-solar/SUNY-gridded-data/
> 3: http://rredc.nrel.gov/solar/old_data/nsrdb/1991-2005/
>
> ------------------------------------------------------------------------------
> Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
> Finally, a world-class log management solution at an even better price-free!
> Download using promo code Free_Logger_4_Dev2Dev. Offer expires
> February 28th, so secure your free ArcSight Logger TODAY!
> http://p.sf.net/sfu/arcsight-sfd2d
> _______________________________________________
> Fedora-commons-users mailing list
> Fedora-commons-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/fedora-commons-users
>

------------------------------------------------------------------------------
Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires 
February 28th, so secure your free ArcSight Logger TODAY! 
http://p.sf.net/sfu/arcsight-sfd2d
_______________________________________________
Fedora-commons-users mailing list
Fedora-commons-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/fedora-commons-users

Re: [fcrepo-user] Fedora Commons for Large Datasets with Thousands of Files

Reply via email to