Folks -
We are playing around with using Dspace for housing our digitized newspaper
collection. So far the collection is about 68,620 PDF files worth about
1.4TB, and we are using another product to present it to the public right
now. Each PDF file represents one newspaper issue with multiple pages in
the PDF.
I am pondering moving it to Dspace, and I would also like to present
newspapers one page at a time, so that when a person makes a query, only
the pages that hold relevant data are presented, so that there is not so
much to download. Right now, you either download an entire issue as a PDF,
or nothing, and some of them are quite large.
The PDF's would be split into individual PDF's with one page each, and I
propose that each page be a collection unto itself. Here's my idea for
collection organizaion: the top community would be the paper name, then a
sub-community would be the year, then a sub-community under that is the
month, then collections for each day of the month. This is the only way I
can see that I would be able to return things down to the newspaper page
level. Each one-page collection would contain a multi-resolution TIFF file,
the pdf page file, a jpg thumbnail, and possibly a text file from the PDF
(they are OCR'd already).
Now, my questions are - how many files can Dspace handle? I would guess
that there could be maybe 6-10 times as many as there are now after
splitting the PDF files, and that's going to grow over time. What are the
limits? Is this just a totally whacked-out idea, or am I going to be able
to build something useful if I continue along these lines?
Thanks -
LibraryMark
------------------------------------------------------------------------------
Monitor your physical, virtual and cloud infrastructure from a single
web console. Get in-depth insight into apps, servers, databases, vmware,
SAP, cloud infrastructure, etc. Download 30-day Free Trial.
Pricing starts from $795 for 25 servers or applications!
http://p.sf.net/sfu/zoho_dev2dev_nov
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech