On Tue, 2008-11-11 at 11:13 -0600, Phil Cryer wrote: > On Tue, 2008-11-11 at 17:18 +0100, Posthumus, Etienne wrote: > > We are in the process of migrating several hundred gigabytes of > > repository content from a CMS to a Fedora 3.x installation. > > One of the issues that we have is the decision whether to store the > > assets (mostly PDF files at the moment) as managed or external > > content. > > Some of the PDF files can be several hundred megabytes in size. > > > > The strategy for the conversion (until now) was to create FOXML > > on-disk with several datastreams embedded, and then do ingest using > > the client command-line scripts. With the large PDF files embedded as > > datastreams, the Java client crashes with out of memory errors, even > > when I increase the heap size to seemingly sufficient sizes ( -Xms512m > > -Xmx640m) > > This is similar to what I did with our Tropicos Images collection - I > didn't want to bring in all of the images, they amounted to over a TB, > so instead I use a link to the image that I ingest to fedora as a > referenced datastream, then I have a script that creates a thumbnail of > the image (if one is accessible) and then ingest that thumbnail as a > managed datastream. > > You could consider making a thumbnail of the pdf as the managed, and a > link to the 'real' one on the filesystem or url as a referenced one.
Also, another thing I considered was having the 'data' directory under Fedora be mounted to a SAN so that storage wouldn't be an issue. This way it would all be managed via Fedora (unsure if this would be a good or bad thing, I didn't test it out, just food for thought) P > > > > So I wonder, what kind of content are other users storing? What are > > the maximum sizes of stored datastreams observed? And do you ingest > > them with FOXML in one go, or use something like an API-M call to add > > the datastream after the object has already been created? > > > > Any thoughts appreciated. > > > > Etienne Posthumus > > resident propellerhead > > TU Delft Library > > Netherlands > > --- > > http://www.library.tudeflt.nl/ > > ------------------------------------------------------------------------- > > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > > Build the coolest Linux based applications with Moblin SDK & win great > > prizes > > Grand prize is a trip for two to an Open Source event anywhere in the world > > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > > _______________________________________________ Fedora-commons-users > > mailing list [email protected] > > https://lists.sourceforge.net/lists/listinfo/fedora-commons-users -- Phil Cryer | Open Source Dev Lead | web www.mobot.org | skype phil.cryer ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Fedora-commons-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/fedora-commons-users
