On Tue, 2008-11-11 at 11:13 -0600, Phil Cryer wrote:
> On Tue, 2008-11-11 at 17:18 +0100, Posthumus, Etienne wrote:
> > We are in the process of migrating several hundred gigabytes of
> > repository content from a CMS to a Fedora 3.x installation.
> > One of the issues that we have is the decision whether to store the
> > assets (mostly PDF files at the moment) as managed or external
> > content.
> > Some of the PDF files can be several hundred megabytes in size.
> >  
> > The strategy for the conversion (until now) was to create FOXML
> > on-disk with several datastreams embedded, and then do ingest using
> > the client command-line scripts. With the large PDF files embedded as
> > datastreams, the Java client crashes with out of memory errors, even
> > when I increase the heap size to seemingly sufficient sizes ( -Xms512m
> > -Xmx640m)
> 
> This is similar to what I did with our Tropicos Images collection - I
> didn't want to bring in all of the images, they amounted to over a TB,
> so instead I use a link to the image that I ingest to fedora as a
> referenced datastream, then I have a script that creates a thumbnail of
> the image (if one is accessible) and then ingest that thumbnail as a
> managed datastream.
> 
> You could consider making a thumbnail of the pdf as the managed, and a
> link to the 'real' one on the filesystem or url as a referenced one.

Also, another thing I considered was having the 'data' directory under
Fedora be mounted to a SAN so that storage wouldn't be an issue.  This
way it would all be managed via Fedora (unsure if this would be a good
or bad thing, I didn't test it out, just food for thought)

P

> >  
> > So I wonder, what kind of content are other users storing? What are
> > the maximum sizes of stored datastreams observed? And do you ingest
> > them with FOXML in one go, or use something like an API-M call to add
> > the datastream after the object has already been created?
> >  
> > Any thoughts appreciated.
> >  
> > Etienne Posthumus
> > resident propellerhead
> > TU Delft Library
> > Netherlands
> > ---
> > http://www.library.tudeflt.nl/
> > -------------------------------------------------------------------------
> > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
> > Build the coolest Linux based applications with Moblin SDK & win great 
> > prizes
> > Grand prize is a trip for two to an Open Source event anywhere in the world
> > http://moblin-contest.org/redirect.php?banner_id=100&url=/
> > _______________________________________________ Fedora-commons-users 
> > mailing list [email protected] 
> > https://lists.sourceforge.net/lists/listinfo/fedora-commons-users
-- 
Phil Cryer | Open Source Dev Lead | web www.mobot.org | skype phil.cryer


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Fedora-commons-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/fedora-commons-users

Reply via email to